One may think that taking a guess is unwise, perhaps having a prior notion about parameter degeneracy, local minimum solutions, or other things regarding Levenberg-Marquardt algorithms. They are definitely legitimate concerns, and there's definitely a limit as to how far initial guesses can be off otherwise guesses wouldn't be necessary. However, it is much more common for people to think that problems are all due to parameter degeneracy or local minimum solutions rather than due to other, more responsible and less obvious, causes. For example, neighboring contamination, incorrect sigma images, sky level determined independently and held fixed to incorrect local value, flatfield non-uniformity, image fitting size, a subtle bug in one's code, etc., all can lead to incorrect and unexpected outcomes. Blaming the wrong cause, it is unlikely any algorithm, downhill gradient or otherwise, would be able to find the correct solution because the desired solution would not yield the best numerical merit, e.g. χ2. There are ways to figure out the true causes, and knowing how to make accurate diagnosis is of course important for both getting the correct answer and for not performing ill-conceived follow-up analysis to ostensibly "correct the errors," which can further lead to other problems or, worse, a false sense of security. For example, if the sigma image is wrong and one ascribes systematic errors to parameter degeneracy, it may lead one to try and correct the errors in the final fit parameters, leading to solutions that are far worse than correcting the problem at the source (as seen here).
In practice, even for an algorithm like Levenberg-Marquardt, which is a down-hill gradient type algorithm, concerns about how good the initial parameters need to be are often a bit overstated for galaxy fitting. Whereas people worry themselves over finding initial guesses that are more accurate than 30-50%, which is nice, the actual allowed tolerances are often far larger: on the order of many hundred, thousand, sometimes even tens or hundreds of thousand, percent, depending on the signal-to-noise and the complexity of the model. Suffice to say that, for single component Sérsic fits, if your guesses are off by less than 100%, they're probably fine. This is not always the case for multi-component fits of a single galaxy. The more number of components one uses, the better and faster the algorithm will converge on a solution if the guesses are more accurate.
This discussion is not meant to persuade anyone that numerical degeneracies or local minima don't exist. Rather, with enough experience, the initial concerns usually give way to a more accurate understanding and hopefully better application of the software tool. There is no shortcut to learning about the true versus imagined or preconceived limitations of the analysis technique except by trying, especially because galaxy profiles are complex things. However, there are some recipes, rules of thumb, gained through experience:
One can do this via simple aperture photometry. This calculation doesn't have to be accurate and the region doesn't have to be very large (but it can be if one likes). Even the crudest estimate is usually good to much better than 1 mag, i.e. a factor of 2.5.mag = -2.5 log10 [(total flux in a box - rough sky value × number of pixel in that box) / EXPTIME] + magzpt.
The reason why parametric techniques are very useful, and often better than non-parametric techniques, in low S/N situations is that parametric fitting is fundamentally about model testing. Model testing is a strength, not a weakness, because it can use both the signal and the noise information via a sigma image, combined together in χ2, in a rigorous and, just as importantly, proper, manner. In other words, in low S/N situations when we just see the "tip of the iceberg" of galaxy centers model testing tries to figure out how much extended wing is allowed by the noise. Noise is information, and an important one at that. If the noise is large, it doesn't mean that the model "doesn't see" the galaxy. Quite the opposite, it means that more and more extended models are consistent with the noise, which is why the errorbar on size becomes larger and larger. Indeed, it is incorrect to think that because we only see the tip, that must also be what a parametric algorithm "sees," and expect the code to behave according to our intuition. Another factor contributing to a misconception is a belief that parametric analysis is highly sensitive to parameter degeneracy -- that critique is analyzed here and shown not to be well deserved in general. Lastly, it is common to think that parametric analysis needs to account for all the components for it to work properly. If one is interested in the physical meaning of the multiple components of a galaxy then that is true but it's for the interpretation to be clear, not because otherwise the fit is somehow wrong in some abstract sense. However, for single component fits the meaning is clear: it is about finding a global model that minimizes the residuals. And that does not necessitate the profile function to be perfect for the technique to work properly. It would certainly be better for the profile functions to be more accurate through multi-component fitting to further reduce the residuals. This is possible to do in an automated way if one does not need to know about the meaning of the subcomponents, but rather to only need to measure the total flux and size.
To illustrate the above claims it is useful to turn to some simple simulations. Figures 1a, 1b, and 1c, show 3 controlled experiments based on simulated galaxies, each one created to have a bulge and a disk of 1:1 ratio in luminosity (left panels), with parameters of the Sérsic profiles shown in the top two rows. When analyzing the data, we fit a single component Sérsic function (middle panels) to purposefully simulate a model mismatch that is often a cited to be a main source of weakness about the technique (see residuals in the top right images). The simulation is repeated for several very high and very low S/N situations. In a single component fit, one sees that the size parameter for a single component is roughly half of the bulge and disk sizes. This makes sense because the bulge-to-disk luminosity ratio is created to be exactly 1-to-1.
SIMULATIONS OF GALAXY SIZE
MEASUREMENTS UNDER LOW AND HIGH SIGNAL-TO-NOISE REGIMES
Figure 1a: a two component model with intrinsic parameters shown on the top row, outside the images. Top row: the first image (left) shows a model with very high S/N of 5000 at the peak, the middle image shows a single component fit to the data, finding the parameters shown in square box, and the third image (right) shows the residuals from the fit. Bottom Row: images showing a similar sequence for the same galaxy now with very low S/N of only 10 at the peak. Despite the fact that the galaxies have very different S/N levels, despite the fact that a single component model is a bad fit to the data, and despite the fact that the disk component has been completely faded below the noise in the second simulation, the size and concentration parameters are measured to be nearly identical to within the error bars given by GALFIT.
Figure 1b: See Figure 1a for details. Similar simulation as Figure 1a except with different bulge size and concentration index. Again, the sizes measured are essentially identical between low and high S/N simulations to within the errors, despite vast differences in S/N.
As we can see from the above simulations, the notion that model mismatch and S/N would preferentially bias parametric analysis toward finding smaller sizes and lower luminosity is not realized in practice. When statistics are properly taken into account in low S/N simulations, the noisier data are consistent with a wider range of acceptable models, which produces parameters that have larger errorbars than high S/N simulations. But the central values are nevertheless consistent between very high and extremely low S/N situations to within the errorbar. In contrast, the discussion immediately below explains why non-parametric analyses, which often do not properly account for noise in the data, may cause one to underestimate galaxy sizes and luminosities.Figure 1c: See Figure 1a for details. Similar simulation as Figure 1a except with different bulge size and concentration index. Here, although the size estimate of the fainter image is larger than for the high S/N image, it is still within the errorbar. Moreover, the direction is toward finding a larger size for the fainter image, contrary to expectations that fading of the outer regions causes the size estimate to be biased toward a lower value.
The above section explains why parametric techniques are useful in low S/N situations because of the premise of model testing.
In contrast, non-parametric techniques, like the curve-of-growth (COG) analysis, are not about hypothesis testing. They often use the noise information to decide where to stop extracting a galaxy profile. As such, the technique indeed corresponds more to the notion of "what you see is what you get" because the noisier the data, the closer in to the center the galaxy seems to "end." COG and other non-parametric techniques often do in fact use the noise information, but they often don't use it properly. To be concrete, imagine trying to image a big, bright, galaxy, say M31, with a very short exposure time, say a microsecond. The galaxy is there in the image, but at such a low level that the image is dominated by sky and readnoise. A curve-of-growth analysis usually compares the flux gradient in an image to the noise in order to determine where the galaxy "ends." It would therefore find a sky and readnoise dominated image to be nearly perfectly flat, to very high precision when the image is large. This would lead one to conclude with high confidence that the size of the galaxy is 0 arcseconds in size, based on careful statistical analysis of the background noise. This of course is the wrong answer. A parametric analysis would tell you that the size is consistent with some central value, but that value has uncertainty of ± ∞, which is another way to say a model of 0 size is as good as one which is extremely large that potentially includes the correct answer. Being highly uncertain about an answer because of statistics is not wrong. However, being absolutely certain about an incorrect answer is wrong, and that is a potential pitfall with non-parametric analysis.
The relative benefits and drawbacks of parametric vs. non-parametric techniques are summarized below:
Curve-of-Growth (COG) Analysis (non-parametric): The COG technique uses apertures of growing radii to sum up the light from a galaxy. Then one performs an analysis on a curve of flux vs. aperture radius. As the aperture grows so does the flux (sky subtracted) until that point when nearly all the light is included and the COG approaches an asymptotic value. The asymptotic flux value is taken to be total light from that galaxy, and the size is then defined to be that radius which encloses some fraction, often 50% or 90%, of the total light, or out to a certain surface brightness.Strengths:
Parametric fitting: The size and luminosity parameters are built into the functional form of the light profile model.Strengths:
As explained above in two different sections, so summarized here, there is a fundamental conceptual difference with regard to how one interprets the measurements given noise between COG and parametric techniques: in the limit where the exposure time is low so that no flux from a galaxy can yet be detected, the COG analysis would be 100% certain that the central value is 0, i.e. there is no galaxy there, because the galaxy gradient is exactly 0. In the same situation, the parametric analysis would say that the parameters are completely uncertain without a firm commitment on the central value, i.e. the errorbar would be large and include 0 flux as a possibility. The ability to properly interpret the data given the noise is why hypothesis testing of parametric analysis is superior to non-parametric techniques.
SIZES OF GALAXIES BASED ON
From the above figure, it's clear that aperture photometry produces large systematic errors at low signal-to-noise (S/N) and when the galaxy profile is steep (red data points). Furthermore, notice how the systematic errors for the sizes trend downward, as explained above. To hit home the above points, what makes parametric fitting a superior technique is that parametric fitting allows one to test consistency of the model in the presence of noise beyond our visual impression of where the galaxy "ends." Again, it is worth emphasizing that parametric fitting doesn't "see" in the same way that our eyes do.Figure 2: Figure taken from Ichikawa, Kajisawa, & Akhlaghi (2012) showing sizes determined from aperture photometry using SExtractor. Left -- the ratio of measured half-light radius (r50) and model radius (r050) compared to the input magnitude (mk). One should not take the x-axis magnitude numbers literally because it is the signal-to-noise that really matters. Blue data points are for pure exponential profiles and red for pure de Vaucouleurs profiles. Right -- the same for radius containing 90% of the light. Compare this figure with how much better parametric model fitting can do here.
The term "parameter degeneracy" means that there are multiple solutions to a problem that statistically are indistinguishable from each other. However, the term "degeneracy" has evolved to also mean local minimum solutions, or generalized even further to mean anything the code does which is not expected. However, there are many ways for systematic biases to appear in data analysis yet have nothing to do with parameter degeneracy issues. The most common examples are situations with neighboring contamination, wrong sigma image, too small an image fitting size, bad convolution PSF, gradient in the sky due to flatfielding errors, etc.. To understand where systematic errors come from and what can be done about them, we have to discriminate between errors that are caused by things in the data, as opposed to limitations in the software. Whereas things in the data can and must be dealt with by an end user (e.g. masking out or simultaneously fitting neighbors), software limitations mean that there's either a bug (e.g. pixel sampling errors, internal gradient calculation errors) or there is a conceptual limitation (e.g. parameter degeneracies, conceptual flaws in implementation). Before delving into how things in the data can cause systematic errors, it is worth discussing first the tests that have been done to verify the software itself.
In fact, rather than the number of parameters being what really matters, it is the type of parameters and the situation that matter more. For instance, a million well isolated stars in the image can be fit just as robustly as a single star, with no local minima because all the parameters are decoupled. Whereas a single component fit using both the axis ratio q and even Fourier m=2 and 4 modes at the same time may have many mathematically equivalent solutions, even though there are only three parameters involved. In short, being cognizant about what one is doing often goes a long way toward undermining supposed advantages of using annealing or Metropolis techniques, which for galaxy fitting have been mostly speculated rather than demonstrated, to my knowledge.
Nevertheless, it is important to address some fundamental questions when dealing with highly complex situations: (1) Can GALFIT robustly fit multiple component models to find the true global minimum, (2) do so without introducing systematic biases in parameter measurements, (3) are the final results sensitive to initial guesses for the parameter values? These questions are particularly relevant to GALFIT because one reason for using GALFIT is that it is versatile, but versatility is meaningful only if one can get right answers. Versatility is needed, for example, to deblend components, whether in a single object or in crowded environments. This involves fitting multiple, often overlapping, objects simultaneously. Although GALFIT has been tested in these regards on an object-by-object basis, generalizing those anecdotal evidences more broadly is difficult. A "proof" requires performing industrial-scale image fitting simulations. And this is only feasible when there is a machinery to analyze large volumes of galaxy survey data, in an automated way using all the capabilities of GALFIT. Recently, this has become possible by using a new algorithm GALAPAGOS (created by Marco Barden -- Barden et al. 2012, MNRAS, 422, 449.), developed in the course of the GEMS project (Galaxy Evolution from Morphology and SEDs, PI: Hans-Walter Rix, 2004, ApJS, 152, 163).
To test GALFIT on large scale, crowded fields of galaxies, my colleagues, Dr. Marco Barden and Dr. Boris Hässler devised and conducted many sets of rigorous simulations using an automated data analysis pipeline (GALAPAGOS, Barden et al. 2012). The results I'm discussing below come from a paper by B. Häussler et al. (2007). Dr. Boris Häussler, whose Ph.D. thesis partly involved trying to break GALFIT and GIM2D through punishing use, created the most challenging simulations of its kind to analyze: the simulations comprised roughly 80,000 galaxies that span the range of parameters in signal-to-noise, size, surface brightnesses, and axis ratio. These objects were placed randomly into ACS images (800 per image), as shown in Figure 3. He then applied GALAPAGOS (a master wrapping-script that runs GALFIT), which performs object detection (using SExtractor), extraction, fitting, and lastly, data sorting. The object densities are quite high: an object has a 50% chance of being located near one or more galaxies which have to be simultaneously fitted/deblended (see Figure 3, especially the left hand panel). And the crowding can range anywhere from 2 objects to 12 objects. To compare the fitting results produced by GALFIT, the same simulated images were fitted independently by Daniel McIntosh using the GIM2D software. In contrast to GALFIT, which requires GALAPAGOS for automation, GIM2D was specifically designed and optimized to perform automated galaxy fitting, moreover, GIM2D uses the Metropolis algorithm to determine the best fit.
CROWDED GALAXY SIMULATIONS
Figure 3: from Häussler et al. (2007): Left -- Crowded galaxy simulations. The location and sizes (the circles radii = galaxy effective radii) of n=4 objects in a typical simulated image. Right -- n=1 (black) and n=4 (grey) simulated galaxies distributed like one of the GEMS images, to illustrate the severity of crowding in the n=4 simulations.
In Häussler et al. (2007), they apply GALAPAGOS (Barden et al. 2012) to analyze galaxy images by using both simultaneous fitting and object masking, based on whether objects are overlapping or well separated in an image. Furthermore, hard decisions about which objects to simultaneously fit or mask out for GALFIT are made in a fully automated manner so there is no human input once the program gets going. The fully automated data analysis allows for testing under the most realistic conditions of galaxy surveys. From these simulations the following key questions about the robustness of complex analysis using GALFIT are addressed:
COMPARISON OF INPUT MODEL RADIUS vs. INITIAL GUESS
Figure 4: Comparisons of galaxy radii definitions (courtesy of Marco Barden). This diagram was used to determine how to convert SExtractor galaxy radius (defined in some useful way) (x-axis) into an initial guess for the effective radius (y-axis) for GALFIT to start iterating on. The red data points are for de Vaucouleurs (n=4) galaxies whereas white data points are exponential profile (n=1) models. The blue line is the input initial guess for the effective radius into GALFIT. Despite the correlations between the cloud of points and the solid line, notice the large difference between the known input model parameters and the proxy flux_radius parameter, especially for n=4 models. Then, compare the large scatter in this figure with the much reduced scatter in Figure 5 below, which shows simulated vs. GALFIT-recovered values.
Häussler's study goes to show that object deblending in a fully automated way not only can be done robustly using a downhill-gradient type algorithm, and with no systematics if the neighboring contamination is rigorously accounted, but that doing so is essential for crowded regimes. The act of masking out objects, by itself, in crowded areas cannot fully get rid of neighboring contamination that causes systematic errors in the fit (see Figure 7, in section on neighboring contamination).
SIMULATIONS OF n=1 (EXPONENTIAL PROFILE, LEFT) and n=4 (DE VAUCOULEURS PROFILE, RIGHT) MODELS
Figure 5: from Häussler et al. (2007): Top panels -- the difference between magnitude recovered and simulated values. Middle panels -- the ratio of recovered and simulated half-light radius. Bottom panels -- recovered Sérsic concentration index. Top x-axis -- average surface brightness within the effective radius, in magnitude per square arcsec. The vertical line is the surface brightness of the sky level, and the grey line that runs through the data points is the median of the scatter.
In literature, there have now been two refereed papers which attributed some systematic errors to GALFIT. Upon followup communication with the authors, I discovered that one was caused by the sky not being flat in their data (which is quite difficult to deal with in some images), and the other was caused by not properly comparing or using the code (in my opinion), as discussed here, here and here. Some situations are in fact very difficult to know about or avoid, even when people are experienced in data analysis. However, if there are known problems in GALFIT that may produce systematic errors, it will be quickly updated and brought to public attention via the main webpage and a user list.
The importance of using a correct sigma (weight) image cannot be overly emphasized. The most natural weighting scheme in a photon "counting experiment" is Poisson statistics. Changing the definition of sigma can lead to nicer behavior in some cases, but unpredictable behavior in other cases. The only way GALFIT "knows" that an object has high or low signal-to-noise is through the use of the sigma image. Thus a problem with the input weight map that wrongly weighs bright and the faint pixels can lead to a bias in the fitting parameters. For instance, a sigma image with a constant small value would give too much emphasis to the core of a galaxy, effectively giving the core a higher S/N than normal. This has the tendency to make GALFIT under-measure the size and luminosity of a galaxy. Keep in mind that because the sigma in flux goes like √f , where f is the count value in unit of [electrons], near the center of a galaxy sigma should be higher than in the outskirts, even though the fractional noise (σf / f) (i.e. relative to the peak flux) is lower in the center.
Users are well advised to spend some time to familiarize themselves with the concept of a sigma image.
DIFFERENCES BETWEEN IRAF/MKOBJ AND GALFIT MODELS
Figure 6: A differenced image of (IRAF/mkobject - GALFIT) models, in positive grey scale, i.e. white = bright.
From the picture it is clear that the biggest disagreements between GALFIT and IRAF/mkobject are the following: the flux right at the central pixel doesn't agree, nor in an annulus several tens of pixels in radius immediately surrounding. IRAF/mkobject appears to create a profile with pixellated patterns in that region. There's a discontinuity in IRAF/mkobject profile due to outer truncation (set by the dynamic range). Just within the truncation, where pixel sampling differences is not a problem, the residual are slightly more elevated, due in part to fitting an IRAF/mkobject model that's truncated, and in part to an effect of IRAF/mkobject aperture correction. The degree of disagreement varies depending on IRAF/artdata settings, but they never agree at the central pixel. Unfortunately, it is not easy to figure out how to set IRAF/mkobject parameters to produce consistent and reliable agreements to better than 5-20%. Adjusting the sampling scheme to much finer values in GALFIT produces negligible differences.
To make sure that the problems shown above are not caused by GALFIT, comparisons were done by several people using different ways (IDL, MIDAS, GIM2D) to generate or integrate up a galaxy profile. In the end, they confirmed there appears to be essentially no systematic errors with GALFIT model profiles to a high accuracy. So no effort will be made to make GALFIT models agree with IRAF/mkobject.
As a side note: there is a study out on A&&;A early in 2006 which finds there to be quite significant systematic errors in GALFIT models (see below). On examining what they've done, it's clear that they used IRAF/mkobject to generate their baseline comparison models. Their results are also affected by neighboring contamination and sky fitting issues as mentioned above, leading to systematics and random errors that are larger than what ought to be using proper analysis.
Taking a step back, given a model that has parameters one can change, χ2, or least-squares fitting involves finding parameters that yield the lowest value of χ2. Practically speaking, the quantity χ2 is simply a way to measure how large the difference is between the model and the data overall, i.e. Δf = fmodel - fdata. A model can be as simple as a straight line, to something much more complicated like an N-dimensional function. Going one step further, the χ2 used to decide on goodness of fit has a very specific, i.e. statistical, meaning. It is a ratio of two quantities: the Δf between model - data, compared with how large a typical deviation, σ, is supposed to be based on Poisson statistics at that point. Obviously, the larger the deviation Δf compared to σ, the poorer is a match between the model and that point, but it is σ that provides the statistical sense of "bad by how much?" at that particular point. This idea of maximizing the probability over all data points, not just one, leads to the idea of minimizing χ2, so χ2 is the merit function to determine when to stop fitting. One important observation here is that as far as χ2 statistics is concerned, being a single scalar number, it does not know whether the data are in the form of a 2-D image or a 1-D surface brightness profile. In fact, it can careless about the dimensionality of the data, spatial or otherwise. Therefore it is important to recognize that the issue of interest here ("is 1-D fitting more sensitive than 2D?") is inherently not one between 1-D vs. 2-D. Instead, it is about whether it is better to fit to a few high signal-to-noise points, inferred from averaging over a subsample of data points to reduce the noise, or to fit all the data points collectively without averaging them first (assuming here that all the data points in each bin fluctuate around the same mean).
Because the issue is not about 1-D vs. 2-D, we can simplify the discussion by referring to a simple example: consider fitting a line to a set of N pixels that fluctuate around a constant value, i.e. f(x)=C electrons, over some arbitrary abscissa (x-axis). A good example is the sky in an image. At this point, the relevant question is, does the final answer in determining the mean value, C, depend on how you group the pixels (e.g. fewer vs. more bins) when doing the fit, if you use up all the data?
The answer lies in the definition of χ2 itself, as one notices that the denominator is the effective uncertainty (σi) at each data point (datai). Specifically, if you average over N data points around some mean value <f(x)>=C the effective σeff=<σi>/√N , and the inverse variance of the mean is: 1/σeff2=N/<σi2>. Compare this with the form of χ2 equation to notice that this is exactly the form of the denominator involving σi, when summed over N pixels with the same mean. Because of this, χ2 statistics is oblivious to the exact composition of the data points being fitted, if the error propagation is done correctly. For a more rigorous exposition of this topic, please see Numerical Recipes (Press et al. 1997), Chapter 15. The bottom line is that in least-squares fitting, it doesn't matter whether the points being fitted are obtained by averaging noisier ones *first*, or by fitting all the noisy pixels together: they produce the same result.
Poisson statistics make 1-D and 2-D techniques at least formally equivalent in the regime where the flux is relatively constant and the profiles are well behaved in a Poisson sense. However, if the flux distribution has too high a curvature in an annulus then 1-D techniques would actually infer a *larger* σ(R) at each radial point R, making the mean flux less well determined at R. This is because, in 1-D fitting, the effective sigma[R] must now reflect the non-Poisson RMS of the flux distribution in the annulus R+/-dR, on top of the intrinsic Poisson fluctuations at each pixel. This has the effect of underweighing the central region of a galaxy because of a larger profile curvature than in the outskirts.
The above discussion does bring up one marginal benefit of 1-D technique over 2-D. In 1-D fitting, σ(R) is often determined from pixel fluctuations inside an elliptical annulus -- i.e., once you draw an elliptical annulus, the RMS fluctuation inside the annulus is your σ(R), if the gradient is not too steep. This technique entirely bypasses the need to provide an external sigma image that you need in 2-D fitting. This has one nicety: since the RMS fluctuations may be non-Poisson, 1-D fitting should, in principle, result in more believable errorbars than 2-D fitting. However, this observation sweeps under-the-rug a requirement for statistical independence between the isophotal rings. In fact, the isophotes are often highly correlated from one to the next: the current isophote is contingent on the one that came before. In 2-D, there are clever ways to deal with this issue, and to create more realistic sigma images, which will be the subject of future code development.
In summary, a claim that is sometimes heard about the benefit of 1-D fitting over 2-D when it comes to surface brightness sensitivity is not correct, from a statistical point of view.
When it comes to comparing how 1-D vs. 2-D techniques do on irregular/peculiar galaxies, common notions about what better means are often vague and contradictory. For instance, one thought is that the technique must be able to (1) break up a profile into different isophotes, so as to follow the course of isophotal twists and, (2) average over large numbers of pixels in annuli so as to overcome perturbations by peculiar features. Thinking more about it, the ability to track detailed isophotal behavior is only interesting if the behavior is meaningful. For irregular/peculiar galaxies, as there is no objective definition for what amount of perturbation constitutes irregularity, the ability to break up isophotes is moot when irregularities exist on all scales. There is also a conceptual problem in the very definition of a "radial profile" itself when the isophotes may be changing and shifting about erratically in a free extraction (allowing all isophotal parameters to vary). In such galaxies the same pixels may be used multiple times, as "isophotes" overlap through twisting, flattening, or dislocating, so that there may not be a unique 1-D profile or even a common center for the isophotes. So, while it is easy to be drawn by the apparent simplicity and order of 1-D profiles, this simplicity instead hides away difficult decisions about how to properly extract the profile, or what the profile means. Thus, when it comes to solving the problem of peculiar features, the "advantages" touted by 1-D proponents are mostly speculative. In truth, no one knows how important these problems are, and, even if important individually, to what extent it all matters in the end to the science question at hand. Without being able to define or quantify what is to be achieved, discussions about which method is generally better for arbitrarily peculiar systems is a bit circular.
It is clear that to know what "better" means, a goal has to be well defined. And the bottom line for most people when they do profile fitting is to obtain a set of simple numbers, for instance, luminosity, size, concentration, and to much lesser extent the axis ratio, position angle. In terms of those principle parameters, the interesting fact is that there are *regularities* in most galaxies. Indeed, that is the main lesson gleaned from 1-D profile extraction. Even in irregular/peculiar galaxies, three of the regularities are: they have an average size, an average profile which falls off with radius, which, by definition, has some kind of average "center." Indeed, the notion of "average" is the key, such that over sufficiently large regions even irregular galaxies have well behaved properties. It's not hard to convince oneself of that -- all one has to do is smear out an irregular galaxy image modestly with a Gaussian. It is this averaged regularity that is often of interest, and it is precisely because of this averaging sense that 2-D fitting using ellipsoids (which, fundamentally, is a weighted *average* over an elliptical kernel) doesn't do such a bad job, despite the shortcoming of its shape. It is also why detailed information about isophotal twists obtained in 1-D extraction does not have as large an impact as one might surmise. Nonetheless, it is true, as 1-D proponents argue, that the ellipsoids having a fixed center, makes traditional 2-D fitting rigid, which might then affect an estimate of the central concentration and slope of a galaxy, even if the size and luminosity, on average, are O.K. (as enforced by least-squares fitting). But, while 1-D proponents argue this to be a problem, exactly *how big* has never been shown because "compared to what" has never even been defined. Nonetheless, the answer is actually not much, as will be argued (but not rigorously) below. And to the extent that galaxy morphology is a messy business to begin with, and even more so for irregular galaxies, 2-D algorithms are probably O.K. in an average sense.
To quantify what O.K. means we have to define a comparison baseline. There are several ways to do this. But the baseline should not be to use 1-D profile results as ground "truth" if the goal is to test whether 2-D is more robust than 1-D (there has been one study in literature which used this circular argument that one should be aware of). Instead, what is more interesting to see is which one is more internally consistent. This kind of comparison is very much beyond the scope of this FAQ and I will not do so here. Instead, I illustrate that 2-D, bland, ellipsoids don't do too badly even on rather irregular-ish galaxies compared to a much more sophisticated technique.
To do so, a much more complex fit is performed by GALFIT Version 3.0 which will be used as the baseline. GALFIT 3.0 will allow 2-D algorithm to break from axisymmetry completely, while retaining the useful traditional parameters such as effective radius, Sérsic index, axis ratio, and the ability to decompose multiple components. Many such examples are shown here. But, the main point is that 2-D fitting is not limited to using ellipsoids. In fact, 2-D fitting can be extremely realistic, with lopsidedness, spiral structure, and so on. (Which, by the way, also defeats another common argument against 2-D fitting in preference over another.)
However, for the purposes of this section, I will illustrate by way of three examples why ellipsoids are "not that bad." The three examples are VII Zw 403, NGC 3741, and II Zw 40, which are found here. These examples show that despite the irregularity and complexities that are involved, a single component (except for II Zw 40) using new features does a rather nice job at tracing the general galaxy shape (lopsided, peculiar, or otherwise). Now, when a traditional ellipsoid is fitted to these galaxies, the differences in size, luminosity, and the Sérsic index values are about 2-5% -- all except for the extreme case of II Zw40, which has an error in size by about 15%. Having now fitted many irregular-looking galaxies, this trend appears to be fairly representative of the differences between naive ellipsoids and more sophisticated 2-D fitting on peculiar/irregular galaxies. Therefore, in terms of *internal consistency* 2-D techniques using traditional ellipsoids are not bad at all, even on galaxies of questionable propriety for using ellipsoids.
It is worthwhile, first, to consider why it is surprising and counter-intuitive to claim that 1-D techniques are superior or more robust, with all else being equal (experimental factors such as image fitting size, masking, etc.). How well 1-D analysis can be done depends on how well 1-D profiles can be extracted from 2-D images. If a galaxy is embedded in a sea of background sources common sense would say that 1-D techniques have no way to rigorously deblend a neighboring galaxy, and the only recourse is to use "masking." But masking only goes so far, and cannot remove light which continues to extend beyond the mask. Therefore, the more crowded an environment, and the fainter an object of interest, the more problematic is neighboring contamination for extracting a 1-D profile. In fairly crowded limits, no amount of masking would do without blocking out the whole galaxy one is interested in fitting. As explained below, image fitting is especially sensitive to neighboring contamination because galaxy wings can be highly extended, which is one reason why sometimes accurate sky determination is difficult but very important. As also discussed above, both 1-D and 2-D techniques are equally sensitive to the faint extensions of galaxy profiles. Furthermore, Figure 7 clearly shows that object masking alone does not suffice to remove the influence of neighboring contamination, the extent of which depends both on the luminosity and distance to a neighbor. Since accurate treatment of neighboring contamination is key for obtaining accurate results, it would seem sensible that a 2-D technique might be beneficial for this reason. In 2-D fitting, nearby neighbors can be simultaneously optimized and fitted to remove the contamination that can't be masked out. Indeed, detailed, crowded, simulations of over 40,000 galaxies, and Figure 7, suggest that simultaneously fitting the neighbors, coupled with masking, is definitely more robust compared to no treatment of neighbors or only using object masking alone, especially when there are multiple neighbors which need to be simultaneously fitted.
Therefore, where had the notion that 1-D fitting methods being more robust than 2-D techniques come from? Here are some instances that have caught my attention -- errors which one should be particularly aware of to judge claims of that sort:
NEIGHBORING CONTAMINATION ON GALAXY FITTING
Figure 7: How neighboring contamination affects the recovered galaxy luminosity, from Häussler et al. (2007). Top panels -- GALFIT simulations, showing that the magnitudes can be recovered reliably when neighbors are fitted away simultaneously as the primary targets. Middle panels -- GIM2D simulations, using object masking to reduce the influence of neighboring contamination; systematic errors remain even so. Bottom panels -- GIM2D simulations, after subtracting the median values from the Middle panels ex post facto.
Figure 8a: Image of quasar PKS 0736+01, from Bennert et al. (2008, ApJ, 677, 846). Original data from the Hubble Space Telescope showing the quasar+host galaxy and the surrounding area. The inset shows the same image but with a deeper stretch.
Figure 8b: This surface brightness profile is constructed by first fitting the data image in 2-D (Poisson weighting), then 1-D profiles are extracted in some way, e.g. by running IRAF/ellipse on the 2-D images of the data and the model independently. The fit is not done in 1-D; it is only a visualization. The square points are extracted from the data with some pixel masking, the dashed lines are the bulge and disk model components, whereas the solid line is the sum of the two components. The lower window shows in more detail the disagreement (difference) between the data and the model. The blue line is the galaxy profile if there is no mask used.
Figure 8c: The residual image corresponding to the surface brightness profile plot of Figure 7a (in the spacecraft orientation).
If you were to do the analysis using uniform weights, indeed, it does seem to produce the desired effect of reducing the outer wing residuals, as long as the data are displayed as a 1-D plot. So what's wrong with believing that giving uniform weights means the outer region has higher weighting?
It might surprise most people to hear that, quite contrary to their belief, weighing all pixels uniformly actually gives higher weight to the bright central regions of an image in 2-D fitting. The explanation for this statement is most easily seen from the definition of the χ2 equation, in which the weight sigma, σ, is on the denominator. The larger the sigma, therefore, the smaller is the weight. So if you don't want a fitting algorithm to "see" a pixel, you make σ really really large for either that pixel or a region of an image. That region then contributes zero value to the χ2 sum. Making all pixels have uniform weights increases the importance of the bright pixels relative to faint ones by a factor of (flux_bright/flux_faint)2 compared to Poisson weights.
So then, why does making all weights uniform produce a more desirable fit to the wing of the galaxy, at least to the eye in 1-D plot? One of the two main reasons for this behavior is exactly that when more weight is given to bright pixels the central region is better fit than outer regions. This causes the sky (which is also fitted as a free parameter) to be elevated to a higher value, because the sky has one degree of freedom that can be used to minimize the residuals in the inner region. In 2-D fitting, this causes the wings to be over-subtracted more than Poisson weighting, thereby artificially truncating the galaxy profile; in the extreme that the outer points don't matter at all, the effect is the same thing as reducing the image fitting region. Now consider the process to extract a 1-D profile from a 2-D image: one has to first subtract out the sky -- the sky that you remove is now a higher value than before, which makes the 1-D profile "look" like the fit got rid of the wing better, but not for the reason you think. The second reason is psychological: that most often when the data points (square points) lie above the model fit in the wings as shown in Figure 8b (solid line at R > 18 arcsec) people will judge this to be evidence that the central region is over-weighted, because, as they reason, the model does not "see" the flux in the wing. Whereas, if the data points were to lie *below* the model, our inclination is to explain away the disagreement as there being a "break" in the exponential profile due either to tidal truncation or spiral arms: therefore the disagreement seems less troubling. Indeed, the upshot is that when there is excess flux in the wings, people usually consider the problem to be in the weights, but less so when there is a deficit.
But, doesn't the bottom line that the agreement *seems* better between the 1-D profiles argue for using uniform weights? The answer is still no, and we now discuss why by asking the following question.
Still, it begs the questions: why shouldn't GALFIT fit that excess flux by raising the sky further to achieve a better fit? The answer is that a "better fit" in 1-D can actually be a worse fit in 2-D. Sounds crazy, but read on. In Figure 8c, on right, I show an image where the target of interest, originally at the center, has been subtracted out, so this is a residual image corresponding to the surface brightness profile shown in Figure 8b. From Figure 8c, it is clear there's a lot going on -- there are many neighboring galaxies and stars, which are discrete and localized as opposed to being the diffuse emission from the central galaxy, so their profiles can't be modeled if they are not fitted simultaneously. Often times, the way to deal with neighboring contamination is to use masking, as it was done in this case. Nevertheless, it is quite hard to do so fully, especially at large radii where the contaminations are due to quite extended objects. For comparison, there is a small green circle located about 30 arcsec to the left of center of the image (find this radius also in the surface brightness profile above). The circle encloses a region where the average sky brightness is roughly zero counts. That is to say, areas of the image darker than inside the green circle are over-subtracted, whereas regions that are brighter are under-subtracted. In 1-D profile, the reason for the rising profile beyond 18 arcsec is that the neighboring objects contaminate the wing of the primary object at the center (now removed). Therefore, in 2-D analysis, raising the sky value to fit the neighbors will in turn over-subtract a much larger portions of the image of the real sky . In such a situation, GALFIT will always over-subtract the image because the fit is done in 2-D. Whereas, displayed in 1-D, the profile will show an excess, depending on how the pixels are rejected and averaged in the extraction.
From this image, it is then intuitively obvious why GALFIT cannot extend the wing of the exponential disk component to "better subtract" larger regions: because doing so would over-subtract a much larger region of the image to try and account for neighbors which are not part of the object of interest. The information about whether the "excess" flux in the outskirt comes from the galaxy of interest, or due to neighbors, is completely lost in 1-D profiles. Because of the spatial information in 2-D, and because 2-D models use strong priors, i.e. that the shapes (ellipticity, PA) of the host galaxy profile are concentric and radially declining, the fitting algorithm can more effectively "deduce" whether the contamination is due to the primary or neighboring objects above and beyond simple masking alone.
So the main point is that a disagreement between 1-D profile and 2-D model is not always evidence of a poor fit in 2-D: 1-D profiles are obtained by averaging over all radii, where bright neighboring galaxies have disproportionately large contributions when the surface brightness of the main galaxy out there is low. Whereas, in 2-D fitting, contaminating galaxies at large radii from the primary have only so much leverage on the profile because it is constrained also by the shape and orientation parameters. This, in principle, produces a more realistic fit, and is one of the virtues of 2-D fitting compared with 1-D. This sounds quite a bit like saying that the model is better than the data at determining what is right. However, keep in mind that the 1-D profile depends on the sky subtraction. So in some sense, 1-D profile of "the data" is not pure data -- it is model dependent on how exactly the isophotes are extracted from a 2-D image.
The bottom line is that the only way to know whether Poisson weights are a problem is to produce simulations to make sure that there are not systematic biases in situations similar to that shown in Figures 6. This is discussed in the section on systematics of galaxy fitting. It is one of the reasons for not preferring uniform weighting to Poisson weighting, even when there is quite significant object contamination.
Suggestion: try starting out with an image diameter at least twice the diameter of the galaxy. Once the fit converges, enlarge it to a radius 2-3 times as large as before. Once that converges, enlarge the box once more to something like 2-3 times larger than that. The reliability of the fit, of course, is tested by seeing that the parameters stop changing much with growing fitting area. If the fitting parameters continue to change significantly with increasing fitting box size, then the cause has to be tracked down. If the image is already very large but the parameters keep changing, that usually means there's a significant profile mismatch, which is causing the Sérsic index to misbehave, or there's significant neighboring contamination. A typical solution is to hold the sky value fixed in the fit, or to mask out the neighboring contamination.
A PSF can be obtained from an image or from from a theoretical model, or some combination thereof. A theoretical model can be simple, like a Gaussian function. Or it can be quite sophisticated. For example, if one were to analyze data from the Hubble Space Telescope, the theoretical models are made by a program called TinyTim. However, for high contrast analysis (e.g. quasar host galaxies), theoretical PSFs are usually not ideal because the profile may not be realistic, or the PSF of the data may change with time and position on a detector. So ideally one should prefer to extract PSFs empirically from the data images (which have also undergone the same image processing that may distort the PSF) over using theoretical models. However, doing so from data is not always easy because there are usually few, if any, stars that are bright, unsaturated, or isolated. Therefore one often has to make do with faint stars, or come up with a hybrid technique that combines analytic models with empirical data. For instance, if there are only faint stars, it might possible to create a high quality PSF from the data by adding them up to increase the signal-to-noise (S/N). Or one can also fit analytic models to the stars themselves and use the analytic model as the PSF which would then have infinite S/N. A third way would be to take a theoretical model, and convolve it with one or more functions to try and match the faint stars in an image. The details of how to do so will be explained below.
WHAT IS AN IDEAL PSF?An ideal PSF has all of the following criteria:
1. Isolated from everything else: The PSF has to be isolated from any and all sources of contamination, including other stars, galaxies, detector artifacts or defects of negative and positive residuals alike. If any of these things exist in a PSF image, they will show up again, often even more strongly, when the model is convolved with the PSF, and that is often undesirable and may have weird consequences.
2. Very high signal-to-noise (S/N)-- ideally, infinite: The S/N of the PSF should ideally be infinite, otherwise, when the PSF is convolved with a model, the result will look noisy. This may affect the quality of the model fit. The technique section just below will explain how to obtain a PSF with infinite S/N.
3. Flat and zero background: The background has to be flat and zero in value, i.e. the sky background has to be removed and cannot have a gradient, blotches, weird moire or other patterns. If any of these things show up in a PSF image, they will also show up in the model image when it is convolved with the PSF. A PSF image that has a non-zero sky background will show a "jump" between regions inside and outside the convolution box, and will affect the fitting parameters.
4. Should match the detailed shapes of a stellar image, including things like ellipticity, diffraction rings and spikes, extended halo, speckle pattern, etc.: This means that if you subtract a PSF from an image of a field star, there should be nothing left over except for noise. This is both a statement about how to construct a PSF as it is about how to check if your PSF is a good one: go and do a model subtraction! Keep in mind that if there is significant geometric distortion across an image, the PSF will be position dependent. If you can't obtain a high S/N PSF from an image, the next best thing is to match the full-width at half maximum (FWHM) of the stars.
5. Be correctly centered in the image: The peak of the PSF needs to be centered in a specific way described below. If it is not, when the model is convolved with the PSF, the flux centroid will appear to be shifted relative to the (x,y) position that GALFIT reports, because of the convolution theorem. This shift will only affect the part of the model that lies within the convolution box; the region outside the convolution box will not be affected. To center the PSF correctly, follow this guideline:
How to center the PSF: If the number of pixels, NPIX, along an axis (row/column) is odd (NPIX=odd), the PSF peak should be centered at NPIX/2+0.5, where NPIX is the number of rows/columns along a side; if NPIX = even, at NPIX/2+1. The PSF image can be rectangular.
6. The image of the PSF should be sufficiently large to contain all the light, but not gratuitously large: The image of the PSF should be large enough to contain 100% of the light. If the PSF image is so small that it cuts off the wing of the PSF, two things will happen. First, the model image will show a "cliff," where the wing of is truncated. Second, the flux of the model will be in error by the fraction of light that is cut off. However, don't go overboard and generate a PSF image that is gratuitously large, because doing so will slow GALFIT down greatly because image convolution is one of the biggest bottleneck in fitting. The rule of thumb is to start with a small PSF when you're trying to figure out what models to use, how many, what combination is best, etc.. This may take several tries. Then, once you obtain a solution you think is final, that's when to use a larger PSF to redo the analysis one last time to get the fluxes and detailed profile shapes right.
The key to figuring out "how big is big enough" for a PSF image is to try wider and wider PSF images until the result of the fits no longer depend on the PSF size. That usually happens somewhere between 20-50 times the FWHM of the PSF (there's a lot of flux out there). That means a typical PSF image is between 50 to 100 pixels on each side, but may be larger sometimes -- always check!
7. Nyquist sampled: This means that that the full width of the PSF at half the peak, i.e. FWHM, should at least be 2 pixels or wider. Sometimes, the data image itself might not be Nyquist sampled, in which case obtaining a Nyquist sampled PSF is tricky, but may be possible using sophisticated "interlacing" tricks to recover a Nyquist sampled PSF. If the PSF is Nyquist sampled, then there's not much more to say about this issue. If not, then there are lots to be said, but which are beyond the scope of this FAQ.
In the event that the data itself might not be Nyquist sampled, but you can obtain a Nyquist sampled PSF, such as through using TinyTim for HST images, or some other ways, you can give that PSF to GALFIT. For instance, if the PSF is N-times better sampled than the data, where N is an integer, you specify the value N in parameter E of GALFIT input file. If the convolution PSF is not Nyquist sampled, this is a bad situation. GALFIT can still convolve a model with it, but the final results will likely not be very accurate.
1. Image extraction: To extract a PSF from an image, you'll need to find a star that has a high S/N *relative* to the object you're trying to fit. If there isn't one star that is bright enough, you can also construct high S/N PSFs from several stars by shifting and adding stamp images of stars, or using DAOphot. Using DAOphot is rather involved and is beyond the scope of this discussion. So I refer the user to the source documentation.
If there's a suitable star (high S/N and not vignetted) in an image, one can often just cut out a small region around it by the following two steps in IRAF:
You may still need to "imshift" the psf.fits a little to get it centered correctly. But be careful how you do this, because shifting a PSF may broaden it out if it is not more than Nyquist sampled. To determine xmin, xmax, ymin, and ymax, please see the above CAVEAT on how to center the PSF, and also below on the PSF size. Note again that the sky pedestal and other objects have to be removed in the PSF image.> imcopy data.fits[xmin:xmax,ymin:ymax] psf.fits
> imarith psf.fits - skyvalue psf.fits
2. TinyTim: For HST or Spitzer data, TinyTim may be a decent alternative technique for getting PSFs when there is no better alternative. For galaxy fitting TinyTim is often "good enough." For high contrast imaging, like trying to separate a bright quasar from a diffuse, underlying galaxy, it is not. For high contrast imaging the above technique is better. Please see the TinyTim website for details on how to use this very nice and easy to use software. The TinyTim software, as useful as it is, often doesn't generate a PSF that is perfectly suited for data analysis without some further tweaking, because an HST image suffers from things like spacecraft jitter, instrument "breathing" issues that change the platescale noticeably, among other reasons. That means the actual PSF is often a bit broader than the TinyTim PSF, which should be corrected. To better match the PSF FWHM, follow the next step.
3. Analytic profile: Ground-based images often have near Gaussian or Moffat-like PSFs. So in principle if you know the FWHM of the seeing, you can create an analytic function to use as the convolution PSF. The crudest way to go is to create an analytic Gaussian or Moffat function using using one's favorite software, matching just the FWHM of the stars.
If you have only moderately faint stars to infer the PSF in an image, but you really need high quality, infinitely high S/N, PSF because your galaxy is really bright or you're trying to do deblend an object with very high contrast sub-components, like a quasar sitting on top of a faint galaxy, then there are several tricks to try:
One important thing to keep in mind about most space telescope images is that the pixels actually undersample the PSFs, i.e. the FWHM of the PSFs are often less than 2 pixels. As explained above, that means convolution will not be done correctly. However, with theoretical PSFs, one can in principle get around that problem because you can generate models that have arbitrarily fine sampling. So the key here is to generate a TinyTim PSF that is oversampled compared to the data and use that as the convolution kernel (Parameter D), telling GALFIT what is the oversampling factor (Parameter E). When GALFIT convolves the PSF with a Gaussian, it will generate that Gaussian on an oversampled grid and bin the model down before matching to the data (this all happens without user knowledge). Once you obtain the best fit, generate an "effective PSF" by setting the oversampling factor (Parameter E) to 1 to prevent GALFIT from re-binning. This effective PSF is then what you should use for data analysis.
There are other reasons for why errorbars may be too small besides imperfect models, such as image scaling, or PSF mismatch (another form of imperfect model). And as always, it is good to try to identify and understand the different factors separately:
The situation just described is rarely a good assumption when dealing with real galaxies because real galaxies profiles don't often correspond to perfect analytical functions. They often also have structures that can't be modeled, e.g.: stars, neighboring galaxies, dust lanes, star formation regions, spiral structures, you name it. The more things there are in the image that can't be explicitly accounted for or masked out, the *smaller* would be the statistical uncertainties -- the parameters don't have to change very much to increase χ2 by a lot because the residuals are correlated and all conspire in non-random ways.
There is, however, one exception, which happens when the galaxy being fitting has low signal-to-noise. In this situation, Poisson noise of the objects dominates the structural fluctuations due to model mismatch, so the errorbars reported by GALFIT are meaningful.
Despite a lack of a statistically robust way to estimate the errorbar via χ2 statistics, one sensible alternative is to estimate *empirical* errorbars. This means repeatedly trying to mask out problematic regions, image sizes, PSF assumptions, etc., then refitting, keeping track of all the parameter values. Given the spread of final parameters via different realizations, one can then set a sensible mean and standard deviation on each parameter. Hardcore statisticians would not like this approach, but there are no other clear alternatives that are easy to implement....
The initial parameters for the Fourier modes can be all zeros.
To account for these and other common situations, GALFIT has several ways to decide when to quit. The simplest situation is when it reaches 100 iterations, at which point it will try to quit at the first opportune moment (unless the rate of convergence picks up momentum) -- this is what the countdown clock normally shows at the start. But a vast majority of the time χ2 will have bottomed out after 20-60 iterations, even for many component fits. When Δχ2 / χ2 becomes really small, i.e. < 10-4, GALFIT will give it 10 more iterations by starting a countdown clock. After this time, if the solution continues to decrease negligibly, GALFIT will quit. However, if χ2 should improve during one of the 10 iterations, the countdown would abort and resume its original clock of 100 iterations. So, toward χ2 minimum, the countdown clock might get triggered time and again.
Note that ever since the Linux/Unix people decided to continuously mess with the shell command "ps" with each version update, I have been having difficulty trying to keep track of which release uses which command flag option. As a result, option P="both" may not work for everybody the way it is intended -- which is too bad, because it's really useful. If you are having problems and if you would like to use it, email me and I'll work with you to find a solution for your specific machine. But, make sure you remember what the solution is, because later on you may need to do it again if I forget to implement it for some reason.
However, to solve the problem more generally, you should delete the inherit keyword from the original image.
hedit imgblock.fits inherit delete+ update+ ver-