How do I determine the initial parameters?

By all means: just take a guess and see what happens.

One may think that taking a guess is unwise, perhaps having a prior notion about parameter degeneracy, local minimum solutions, or other things regarding Levenberg-Marquardt algorithms. They are definitely legitimate concerns, and there's definitely a limit as to how far initial guesses can be off otherwise guesses wouldn't be necessary. However, it is much more common for people to think that problems are all due to parameter degeneracy or local minimum solutions rather than due to other, more responsible and less obvious, causes. For example, neighboring contamination, incorrect sigma images, sky level determined independently and held fixed to incorrect local value, flatfield non-uniformity, image fitting size, a subtle bug in one's code, etc., all can lead to incorrect and unexpected outcomes. Blaming the wrong cause, it is unlikely any algorithm, downhill gradient or otherwise, would be able to find the correct solution because the desired solution would not yield the best numerical merit, e.g. χ2. There are ways to figure out the true causes, and knowing how to make accurate diagnosis is of course important for both getting the correct answer and for not performing ill-conceived follow-up analysis to ostensibly "correct the errors," which can further lead to other problems or, worse, a false sense of security. For example, if the sigma image is wrong and one ascribes systematic errors to parameter degeneracy, it may lead one to try and correct the errors in the final fit parameters, leading to solutions that are far worse than correcting the problem at the source (as seen here).

In practice, even for an algorithm like Levenberg-Marquardt, which is a down-hill gradient type algorithm, concerns about how good the initial parameters need to be are often a bit overstated for galaxy fitting. Whereas people worry themselves over finding initial guesses that are more accurate than 30-50%, which is nice, the actual allowed tolerances are often far larger: on the order of many hundred, thousand, sometimes even tens or hundreds of thousand, percent, depending on the signal-to-noise and the complexity of the model. Suffice to say that, for single component Sérsic fits, if your guesses are off by less than 100%, they're probably fine. This is not always the case for multi-component fits of a single galaxy. The more number of components one uses, the better and faster the algorithm will converge on a solution if the guesses are more accurate.

This discussion is not meant to persuade anyone that numerical degeneracies or local minima don't exist. Rather, with enough experience, the initial concerns usually give way to a more accurate understanding and hopefully better application of the software tool. There is no shortcut to learning about the true versus imagined or preconceived limitations of the analysis technique except by trying, especially because galaxy profiles are complex things. However, there are some recipes, rules of thumb, gained through experience:

Why is the GALFIT output image blank?

The output image produced by GALFIT is an image block, with the zeroth image slice being blank. Thus displaying the image without specifying the proper image slice will result in a blank image being displayed. To see the image block using DS9, go to "File"==>"Open Other"==>"Open Multi-Ext as Data Cube." To display it using IRAF, type "display imgblock[1]". Use [2] and [3] to see other image slices. There are several ways to show image blocks in IDL and other software as well.

How biased are the size and luminosity measurements by low signal-to-noise when you can't see outer regions of galaxies?

A question people often have is whether signal-to-noise (S/N) can cause systematic biases in the fitting parameters. Some believe model fitting is likely to underestimate the size and luminosity parameters because our intuitions tells us that we can see only the "the tip of the iceberg," thus the code is likely to miss out on flux that is beneath the noise. It is also common to think that one has to account for all the structural components in galaxies in order to properly measure the total luminosity and the global size. Therefore, if the outer wings have too low S/N, we can't account for them, so the measurements are likely to be biased toward smaller values. This belief often leads one to choose non-parametric techniques, like aperture or curve-of-growth (COG, see below) analyses, because they do not rely on one's knowledge about galaxy structures to measure the luminosity and sizes. This section and the one below explain why adopting non-parametric techniques actually may steer one closer to pitfalls which one is consciously trying to avoid.

The reason why parametric techniques are very useful, and often better than non-parametric techniques, in low S/N situations is that parametric fitting is fundamentally about model testing. Model testing is a strength, not a weakness, because it can use both the signal and the noise information via a sigma image, combined together in χ2, in a rigorous and, just as importantly, proper, manner. In other words, in low S/N situations when we just see the "tip of the iceberg" of galaxy centers model testing tries to figure out how much extended wing is allowed by the noise. Noise is information, and an important one at that. If the noise is large, it doesn't mean that the model "doesn't see" the galaxy. Quite the opposite, it means that more and more extended models are consistent with the noise, which is why the errorbar on size becomes larger and larger. Indeed, it is incorrect to think that because we only see the tip, that must also be what a parametric algorithm "sees," and expect the code to behave according to our intuition. Another factor contributing to a misconception is a belief that parametric analysis is highly sensitive to parameter degeneracy -- that critique is analyzed here and shown not to be well deserved in general. Lastly, it is common to think that parametric analysis needs to account for all the components for it to work properly. If one is interested in the physical meaning of the multiple components of a galaxy then that is true but it's for the interpretation to be clear, not because otherwise the fit is somehow wrong in some abstract sense. However, for single component fits the meaning is clear: it is about finding a global model that minimizes the residuals. And that does not necessitate the profile function to be perfect for the technique to work properly. It would certainly be better for the profile functions to be more accurate through multi-component fitting to further reduce the residuals. This is possible to do in an automated way if one does not need to know about the meaning of the subcomponents, but rather to only need to measure the total flux and size.

To illustrate the above claims it is useful to turn to some simple simulations. Figures 1a, 1b, and 1c, show 3 controlled experiments based on simulated galaxies, each one created to have a bulge and a disk of 1:1 ratio in luminosity (left panels), with parameters of the Sérsic profiles shown in the top two rows. When analyzing the data, we fit a single component Sérsic function (middle panels) to purposefully simulate a model mismatch that is often a cited to be a main source of weakness about the technique (see residuals in the top right images). The simulation is repeated for several very high and very low S/N situations. In a single component fit, one sees that the size parameter for a single component is roughly half of the bulge and disk sizes. This makes sense because the bulge-to-disk luminosity ratio is created to be exactly 1-to-1.


Figure 1a: a two component model with intrinsic parameters shown on the top row, outside the images. Top row: the first image (left) shows a model with very high S/N of 5000 at the peak, the middle image shows a single component fit to the data, finding the parameters shown in square box, and the third image (right) shows the residuals from the fit. Bottom Row: images showing a similar sequence for the same galaxy now with very low S/N of only 10 at the peak. Despite the fact that the galaxies have very different S/N levels, despite the fact that a single component model is a bad fit to the data, and despite the fact that the disk component has been completely faded below the noise in the second simulation, the size and concentration parameters are measured to be nearly identical to within the error bars given by GALFIT.

Figure 1b: See Figure 1a for details. Similar simulation as Figure 1a except with different bulge size and concentration index. Again, the sizes measured are essentially identical between low and high S/N simulations to within the errors, despite vast differences in S/N.

Figure 1c: See Figure 1a for details. Similar simulation as Figure 1a except with different bulge size and concentration index. Here, although the size estimate of the fainter image is larger than for the high S/N image, it is still within the errorbar. Moreover, the direction is toward finding a larger size for the fainter image, contrary to expectations that fading of the outer regions causes the size estimate to be biased toward a lower value.

As we can see from the above simulations, the notion that model mismatch and S/N would preferentially bias parametric analysis toward finding smaller sizes and lower luminosity is not realized in practice. When statistics are properly taken into account in low S/N simulations, the noisier data are consistent with a wider range of acceptable models, which produces parameters that have larger errorbars than high S/N simulations. But the central values are nevertheless consistent between very high and extremely low S/N situations to within the errorbar. In contrast, the discussion immediately below explains why non-parametric analyses, which often do not properly account for noise in the data, may cause one to underestimate galaxy sizes and luminosities.

Why determine galaxy sizes and luminosities with parametric instead of non-parametric techniques?

Measuring galaxy sizes and luminosities is a tricky business because most galaxies don't have sharp edges. There are two general approaches to measuring sizes and luminosities: non-parametric and parametric. Non-parametric techniques involve aperture photometry of some sort like curve-of-growth (COG) analysis, whereas parametric techniques involve fitting functions to images of galaxies. The most common definitions of "size" involve measuring flux inside a certain radius, such as the half-light-radius, or perhaps some kind of scalelength parameter in the case of functional model fitting. Luminosities are often measured using aperture flux inside a certain radius, or flux integrated out to some surface brightness level, or based on the total flux from analytic models or COG. Both parametric and non-parametric techniques have benefits and drawbacks, some of which are discussed in this section.

The above section explains why parametric techniques are useful in low S/N situations because of the premise of model testing.

In contrast, non-parametric techniques, like the curve-of-growth (COG) analysis, are not about hypothesis testing. They often use the noise information to decide where to stop extracting a galaxy profile. As such, the technique indeed corresponds more to the notion of "what you see is what you get" because the noisier the data, the closer in to the center the galaxy seems to "end." COG and other non-parametric techniques often do in fact use the noise information, but they often don't use it properly. To be concrete, imagine trying to image a big, bright, galaxy, say M31, with a very short exposure time, say a microsecond. The galaxy is there in the image, but at such a low level that the image is dominated by sky and readnoise. A curve-of-growth analysis usually compares the flux gradient in an image to the noise in order to determine where the galaxy "ends." It would therefore find a sky and readnoise dominated image to be nearly perfectly flat, to very high precision when the image is large. This would lead one to conclude with high confidence that the size of the galaxy is 0 arcseconds in size, based on careful statistical analysis of the background noise. This of course is the wrong answer. A parametric analysis would tell you that the size is consistent with some central value, but that value has uncertainty of ± ∞, which is another way to say a model of 0 size is as good as one which is extremely large that potentially includes the correct answer. Being highly uncertain about an answer because of statistics is not wrong. However, being absolutely certain about an incorrect answer is wrong, and that is a potential pitfall with non-parametric analysis.

The relative benefits and drawbacks of parametric vs. non-parametric techniques are summarized below:

  1. Curve-of-Growth (COG) Analysis (non-parametric): The COG technique uses apertures of growing radii to sum up the light from a galaxy. Then one performs an analysis on a curve of flux vs. aperture radius. As the aperture grows so does the flux (sky subtracted) until that point when nearly all the light is included and the COG approaches an asymptotic value. The asymptotic flux value is taken to be total light from that galaxy, and the size is then defined to be that radius which encloses some fraction, often 50% or 90%, of the total light, or out to a certain surface brightness.

    • Model independent (sort of). This method does not depend on the shape of the galaxy and is thus said to be model independent. It can be applied to irregular galaxies and regular galaxies, and galaxies of all profile types (exponential, de Vaucouleurs, etc.). However, in practice this is only partly true because one replaces parametric models with interpretations about the behavior of the COG, or a functional fit to COG to derive the asymptotic integrated flux. Both of these processes are dependent on assumptions (i.e. model) about the morphology and environment. So while the notion is that the technique is robust for galaxies across all shapes, in practice the implementation may make the technique susceptible to implicit model assumptions about the COG, as elaborated further below concerning the weaknesses of this technique.

    • Simplicity. It is easy to apply this technique across galaxies of different shapes and sizes without making special considerations.

    • Does not account for the PSF. So when the galaxy size is only a few times the size of the PSF FWHM or the resolution element of the detector, or when the galaxy profile is steeply concentrated, this technique is not very reliable for measuring the size.

    • Signal-to-noise (S/N), thus morphology, dependent. COG and aperture techniques are to some degree S/N dependent as they rely on being able to measure a gradient in galaxies. One can only detect shallow profile gradients under very high S/N situations. This would bias against measuring correct sizes for galaxies that have slowly declining light profiles, such as early-type or low surface brightness (relative to the noise) galaxies. Another reason why this analysis is S/N dependent is that, in the most common way of creating a COG, one has to first subtract out the background sky: if the gradient is shallow with low S/N, then the sky will be over-estimated, which then affects the COG analysis. The decoupled, stepwise, approach makes it always more likely to overestimate the sky, underestimate the luminosity, and underestimate the size. See figure below for a concrete example.

    • Crowding limited. COG does not work well if the background is not flat, because the COG may not flatten out. And the background may not be flat for all sorts of reasons, e.g. crowded environments, overlapping galaxies, flatfielding errors, bad sky subtraction, etc.. This gives rise to frequent dilemma: if the COG starts to grow again at larger radii after showing signs of flattening off, how does one decide what of the various factors it is due to? How does one quantify? Making this decision is non-trivial and intuition dependent. Without being able to quantify how much those different variables matter, decisions about this issue in turn impact the extraction of the morphology parameters, making COG dependent on situation and morphology.

  2. Parametric fitting: The size and luminosity parameters are built into the functional form of the light profile model.

    • Accounts for the PSF. The technique tests for consistency of a galaxy profile with model functions by fitting them to the data after convolving with the PSF. It is thus capable of measuring or constraining the shape of the galaxy profile all the way into the center. This is necessary if one needs to measure galaxy sizes that are close to the PSF or pixel resolution limit.

    • Deals better with low S/N situations. It is often said that analytic profiles can "extrapolate below the noise" to better recover the total flux; this is an intuitive, generally correct, although somewhat imprecise explanation. A more accurate way to think about this is that in parametric analysis it is not just the flux that matters, but the amount of flux compared to the expected noise given that flux, encoded in the definition of χ2. Because of χ2, weak signal accumulates over large areas even when our eyes may not detect any flux. These two facts mean that even if a galaxy has extended wings that get washed out by the noise, a model with a large envelope would still be consistent with the data given the noise (sigma). In low S/N, the size and luminosity determinations are more uncertain because models with smaller and larger sizes may also be consistent with the noise. However, this does not necessary mean the fit is biased one way or another. For biases to appear the flux signal would have to be unrepresentative of the sigma for that flux level. In contrast, in COG analysis the size determination is based on measuring the gradient of the light profile. Because the gradient does depend on the S/N, when there is noise, and only noise, COG analysis would always bias size estimates toward smaller values (see figure that follows).

      As explained above in two different sections, so summarized here, there is a fundamental conceptual difference with regard to how one interprets the measurements given noise between COG and parametric techniques: in the limit where the exposure time is low so that no flux from a galaxy can yet be detected, the COG analysis would be 100% certain that the central value is 0, i.e. there is no galaxy there, because the galaxy gradient is exactly 0. In the same situation, the parametric analysis would say that the parameters are completely uncertain without a firm commitment on the central value, i.e. the errorbar would be large and include 0 flux as a possibility. The ability to properly interpret the data given the noise is why hypothesis testing of parametric analysis is superior to non-parametric techniques.

    • Better deals with crowding and other complex situations. Multicomponent analysis can deblend overlapping galaxies and even deal with situations of non-uniform backgrounds. It is not only easier to visualize the situation when subtle problems arise (flatfielding errors, sky gradient, neighboring contamination, etc.), but by accounting for them explicitly in the models, one can quantify their effects in the parameter estimates.

    • Model dependent. Galaxies don't have perfect analytic shapes that we try to force fit on them, both in terms of their radial light profile and their projected shape. Thus it is not clear sometimes what models mean. Sometimes a bad fit can adversely affect the luminosity and size measurements. However, see sections above and below.

    • Difficult to implement. How many components should one use? When is a fit good enough? When should one turn on the Fourier modes to account for lopsidedness or irregularities? How should one deal with neighboring galaxies in an automated way? How does one interpret the "size" when a galaxy is fitted with multiple components? These are all legitimate concerns and are certainly more difficult to deal with than non-parametric techniques. However, the problems are actually technically quite tractable by being a little bit more clever. Some solutions are discussed here and elsewhere in this document.

    • Parameter degeneracy. This is often thought to be a serious issue by people speculating on potential difficulties of parametric analysis and who have little experience with the technique. However, the seriousness of this issue is generally quite a bit overstated. In many instances, what may seem like parameter degeneracy may be due to other factors that someone may not be aware of, such as the sigma image or non-uniformity in the sky background, or not properly accounting for neighboring sources. As discussed in section below, when the main issues are properly dealt with, serious concerns about parameter degeneracy do not seem to be warranted.
To make the above discussion about S/N more concrete, the following figure, taken from Ichikawa, Kajisawa, & Akhlaghi (2012), shows what happens when one uses aperture photometry techniques to determine the size of galaxies rather than through model fitting. The photometry is performed using SExtractor, but similar effects would be seen in all aperture techniques. The data points are pure analytic exponential (Sérsic n=1, blue) and de Vaucouleurs (Sérsic n=4, red) profiles:


Figure 2: Figure taken from Ichikawa, Kajisawa, & Akhlaghi (2012) showing sizes determined from aperture photometry using SExtractor. Left -- the ratio of measured half-light radius (r50) and model radius (r050) compared to the input magnitude (mk). One should not take the x-axis magnitude numbers literally because it is the signal-to-noise that really matters. Blue data points are for pure exponential profiles and red for pure de Vaucouleurs profiles. Right -- the same for radius containing 90% of the light. Compare this figure with how much better parametric model fitting can do here.
From the above figure, it's clear that aperture photometry produces large systematic errors at low signal-to-noise (S/N) and when the galaxy profile is steep (red data points). Furthermore, notice how the systematic errors for the sizes trend downward, as explained above. To hit home the above points, what makes parametric fitting a superior technique is that parametric fitting allows one to test consistency of the model in the presence of noise beyond our visual impression of where the galaxy "ends." Again, it is worth emphasizing that parametric fitting doesn't "see" in the same way that our eyes do.

How do I measure the half-light radius (re) for a multi-component galaxy, or when the residuals don't look smooth?

All galaxies are multi-component, so this is a general issue. When performing parametric fitting to determine the sizes of galaxies, it is not obvious how to define "size" when a galaxy is lopsided, highly irregular, or made up of several components. Furthermore, when a galaxy is not very well fit and the residuals are large, the size measurement can be misleading if the sky estimate is also affected by the bad fit. Therefore, how does one determine the size? There are several methods that come immediately to mind.

  1. First, even if a galaxy is not well-fitted using a single component, if the Sérsic is fairly low, i.e. below n~4 and the sky is well determined, then the size of the galaxy should not be too badly affected. To check if that's the case, or to try and do a better job, one can try the following:

  2. Fit the galaxies as well as possible using multiple components. Here the goal is not to worry about what the components mean; they may or may not have physical significance. The goal is to find a set of components that represents the galaxy light distribution as well as possible. If any component has a Sérsic index larger than 4 and the residuals are large, then either consider the possibility of holding that fixed to n=4 or add more components to reduce the residuals. Otherwise the fit may affect estimates of the sky background value. If the Sérsic index is high because there's an extended wing, then additional components will be able to make up for that. Make sure none of the components have any problems; problematic parameters will be marked by *...* in GALFIT version 3 and beyond. After you obtain a good fit, hold the sky value fixed to that determined by GALFIT, then replace all the components with just a single Sérsic model: this will let you estimate an effective Sérsic index and an effective radius. If the Sérsic index does not behave when doing so, hold it fixed to a value that yields the same total magnitude from fitting multiple components.

  3. Another variation based on the above technique is that, after obtaining the best multi-component fit, tell GALFIT to generate a summed model without the sky and without convolving the model with the PSF. Then, using simple aperture photometry, find an aperture radius that gives half the total light of the analytic model. This has a benefit that the "half-light-radius" means exactly what it says, without subtleties about how well the galaxy corresponds to a Sérsic profile, and with the seeing properly taken into account.

What would cause systematic errors in fitting?

GALFIT lets people fit galaxy images in a number of ways, from single component to many components; from simple ellipsoid shapes to irregular, spiral, and truncated shapes. Having flexibility is useful for dealing with a wide range of situations one may encounter. On the other hand, having flexibility may not be viewed as a good thing if one is worried about parameter or model degeneracies. Parameter degeneracy is a commonly voiced concern regarding model fitting in general. GALFIT is constantly being tested by many people, even 10 years after the first release. Testing GALFIT is as much about learning the limitations of the code as it is about making sure the code is being used correctly. For, even if the code works as it is supposed to, there are ways to use it incorrectly. Over the years, a number of studies have reported on single and multiple components fitting tests, and both manual and automated forms of analysis. Based on realistic simulations discussed in this section and in other refereed papers, GALFIT seems to work as intended. However, occasionally some new studies report finding systematic biases that contradict simulations by prior studies. Some claim that small and large galaxies, bright and faint galaxies, show different degrees of systematics. Others find that crowded fields can produce systematic errors. The reason for these behaviors is often attributed to to parameter degeneracy. However, there are reasons to be skeptical of such claims. These and other situations are examined in this section.

The term "parameter degeneracy" means that there are multiple solutions to a problem that statistically are indistinguishable from each other. However, the term "degeneracy" has evolved to also mean local minimum solutions, or generalized even further to mean anything the code does which is not expected. However, there are many ways for systematic biases to appear in data analysis yet have nothing to do with parameter degeneracy issues. The most common examples are situations with neighboring contamination, wrong sigma image, too small an image fitting size, bad convolution PSF, gradient in the sky due to flatfielding errors, etc.. To understand where systematic errors come from and what can be done about them, we have to discriminate between errors that are caused by things in the data, as opposed to limitations in the software. Whereas things in the data can and must be dealt with by an end user (e.g. masking out or simultaneously fitting neighbors), software limitations mean that there's either a bug (e.g. pixel sampling errors, internal gradient calculation errors) or there is a conceptual limitation (e.g. parameter degeneracies, conceptual flaws in implementation). Before delving into how things in the data can cause systematic errors, it is worth discussing first the tests that have been done to verify the software itself.

Are 1-D techniques better than 2-D at fitting low surface brightness regions or irregular galaxies?

Are 1-D techniques more robust against neighboring contamination than 2-D fitting techniques?

There are often discussions about whether 1-D fitting, or some hybrid scheme combining 1-D with 2-D, is more robust against neighboring contamination than 2-D fitting. There is even one study years ago which claims that 1-D analysis, or a hybrid scheme is better. However, after examining the claim closely, it seems like the results came from not treating 2-D analysis in the same way as in 1-D analysis. I am aware of no other study which finds that 1-D techniques are more robust, nor less so, against neighboring contamination. But if common sense is a guide, it would argue that 1-D techniques ought to be less robust, as discussed below.

It is worthwhile, first, to consider why it is surprising and counter-intuitive to claim that 1-D techniques are superior or more robust, with all else being equal (experimental factors such as image fitting size, masking, etc.). How well 1-D analysis can be done depends on how well 1-D profiles can be extracted from 2-D images. If a galaxy is embedded in a sea of background sources common sense would say that 1-D techniques have no way to rigorously deblend a neighboring galaxy, and the only recourse is to use "masking." But masking only goes so far, and cannot remove light which continues to extend beyond the mask. Therefore, the more crowded an environment, and the fainter an object of interest, the more problematic is neighboring contamination for extracting a 1-D profile. In fairly crowded limits, no amount of masking would do without blocking out the whole galaxy one is interested in fitting. As explained below, image fitting is especially sensitive to neighboring contamination because galaxy wings can be highly extended, which is one reason why sometimes accurate sky determination is difficult but very important. As also discussed above, both 1-D and 2-D techniques are equally sensitive to the faint extensions of galaxy profiles. Furthermore, Figure 7 clearly shows that object masking alone does not suffice to remove the influence of neighboring contamination, the extent of which depends both on the luminosity and distance to a neighbor. Since accurate treatment of neighboring contamination is key for obtaining accurate results, it would seem sensible that a 2-D technique might be beneficial for this reason. In 2-D fitting, nearby neighbors can be simultaneously optimized and fitted to remove the contamination that can't be masked out. Indeed, detailed, crowded, simulations of over 40,000 galaxies, and Figure 7, suggest that simultaneously fitting the neighbors, coupled with masking, is definitely more robust compared to no treatment of neighbors or only using object masking alone, especially when there are multiple neighbors which need to be simultaneously fitted.

Therefore, where had the notion that 1-D fitting methods being more robust than 2-D techniques come from? Here are some instances that have caught my attention -- errors which one should be particularly aware of to judge claims of that sort:

Why are the fit parameters so sensitive to the sky value and/or the presence of neighboring galaxies?

Is it sometimes better to use uniform (or another non-Poisson) weighting scheme in the fit?

Sometimes a discussion comes up about whether it is better to use "uniform" weighting scheme instead of Poisson weights. The lead-in into the discussion is usually a belief based on observations that, in plots of 1-D profile, one part of the galaxy, like the outer region, doesn't fit so well, seeming as though the code "does not see" that part of the galaxy. This seems to make intuitive sense from a statistical standpoint in that, as some reason, 1-D codes are more sensitive to flux in the outer region because each data point is derived by averaging over a lot of pixels, whereas 2-D algorithms work on low signal-to-noise pixels in-place. This erroneous understanding of statistics leads one to conclude that the outer region should be given a bit more weight than the inner region of the galaxy. This issue comes up in discussions about the merits of 1-D vs. 2-D analysis and, as explained in this section, the notion that 1-D data have higher signal is based on a misunderstanding of how statistics work. For, there is no more information in a 1-D surface brightness profile than there is in a 2-D image. However, what is true is that 1-D surface brightness fits and 2-D fits may not always agree, but that is not caused by the weights. Rather, the difference comes from the process of extracting a 1-D profile, which loses spatial information content compared to analyzing a 2-D image.

Figure 8a: Image of quasar PKS 0736+01, from Bennert et al. (2008, ApJ, 677, 846). Original data from the Hubble Space Telescope showing the quasar+host galaxy and the surrounding area. The inset shows the same image but with a deeper stretch.

Figure 8b: This surface brightness profile is constructed by first fitting the data image in 2-D (Poisson weighting), then 1-D profiles are extracted in some way, e.g. by running IRAF/ellipse on the 2-D images of the data and the model independently. The fit is not done in 1-D; it is only a visualization. The square points are extracted from the data with some pixel masking, the dashed lines are the bulge and disk model components, whereas the solid line is the sum of the two components. The lower window shows in more detail the disagreement (difference) between the data and the model. The blue line is the galaxy profile if there is no mask used.

Figure 8c: The residual image corresponding to the surface brightness profile plot of Figure 7a (in the spacecraft orientation).

Here's a specific example (see Figure on right) taken from a work I did with Profs/Drs. Vardha Bennert and Gabriela Canalizo (2008), where the brightness profile is that of a galaxy hosting a luminous quasar. When the central quasar and the galaxy are removed, this is what it looks like. The 1-D plot (Figure 8b) shows a clear excess in the outer region (beyond radius of 20 arcsecond), compared to what the model looks like when the object is fitted in 2-D, shown in solid black line (the dashed lines sum up to yield the solid). Clearly, the 1-D profile data points (in squares) show an excess of flux in the wings compared to the model shown as a solid line. Because the analysis is done in 2-D, and only visualized in a 1-D plot, this is often interpreted to be evidence that 2-D analysis places too much weight to the central region (due to higher signal-to-noise there) at the expense of the outer wings. So some people hypothesize that weighing all the pixels uniformly in the image would solve the problem by shifting the weights more to the outer region.

If you were to do the analysis using uniform weights, indeed, it does seem to produce the desired effect of reducing the outer wing residuals, as long as the data are displayed as a 1-D plot. So what's wrong with believing that giving uniform weights means the outer region has higher weighting?

It might surprise most people to hear that, quite contrary to their belief, weighing all pixels uniformly actually gives higher weight to the bright central regions of an image in 2-D fitting. The explanation for this statement is most easily seen from the definition of the χ2 equation, in which the weight sigma, σ, is on the denominator. The larger the sigma, therefore, the smaller is the weight. So if you don't want a fitting algorithm to "see" a pixel, you make σ really really large for either that pixel or a region of an image. That region then contributes zero value to the χ2 sum. Making all pixels have uniform weights increases the importance of the bright pixels relative to faint ones by a factor of (flux_bright/flux_faint)2 compared to Poisson weights.

So then, why does making all weights uniform produce a more desirable fit to the wing of the galaxy, at least to the eye in 1-D plot? One of the two main reasons for this behavior is exactly that when more weight is given to bright pixels the central region is better fit than outer regions. This causes the sky (which is also fitted as a free parameter) to be elevated to a higher value, because the sky has one degree of freedom that can be used to minimize the residuals in the inner region. In 2-D fitting, this causes the wings to be over-subtracted more than Poisson weighting, thereby artificially truncating the galaxy profile; in the extreme that the outer points don't matter at all, the effect is the same thing as reducing the image fitting region. Now consider the process to extract a 1-D profile from a 2-D image: one has to first subtract out the sky -- the sky that you remove is now a higher value than before, which makes the 1-D profile "look" like the fit got rid of the wing better, but not for the reason you think. The second reason is psychological: that most often when the data points (square points) lie above the model fit in the wings as shown in Figure 8b (solid line at R > 18 arcsec) people will judge this to be evidence that the central region is over-weighted, because, as they reason, the model does not "see" the flux in the wing. Whereas, if the data points were to lie *below* the model, our inclination is to explain away the disagreement as there being a "break" in the exponential profile due either to tidal truncation or spiral arms: therefore the disagreement seems less troubling. Indeed, the upshot is that when there is excess flux in the wings, people usually consider the problem to be in the weights, but less so when there is a deficit.

But, doesn't the bottom line that the agreement *seems* better between the 1-D profiles argue for using uniform weights? The answer is still no, and we now discuss why by asking the following question.

What fitting box size should I use?

What convolution box size should I use?

How do I create a point spread function (PSF) for convolution?

Creating a point spread function (PSF) is not difficult, but creating a good PSF can be rather subtle and occasionally technical. When you think about a PSF, the picture that should come immediately to mind is a bright and isolated star, with nothing else around, and with no sky background. It is useful to understand that the PSF, technically, is an image of a perfect point source rather than "just an image of a star"; it is how a perfect point is transformed into an image by reflecting and refracting optics and other media (e.g. atmosphere). In astronomy, we only care about far-field PSFs, where the wavefront from a point source impinging on the primary mirror of a telescope comes from infinity, and is in principle flat, as opposed to having a high curvature for near-field PSFs. That far-field PSF wavefront may not be perfectly flat if it has passed through an atmosphere, or some other refractive medium, before getting to the telescope. The PSF that observers usually care about is the overall PSF from light that has gone through the atmosphere, the telescope optics, and whatever else before or in between. That's the reason why when we talk about a PSF, most of the time we mean an image of a star. However, the PSF may very well change with time and physical location on a detector. A PSF is not just a property of the telescope; it is a property of every scattering and refracting interstellar and atmospheric media, as well as all telescope and reimaging optics in between the emitting point source at infinity and the detector.

A PSF can be obtained from an image or from from a theoretical model, or some combination thereof. A theoretical model can be simple, like a Gaussian function. Or it can be quite sophisticated. For example, if one were to analyze data from the Hubble Space Telescope, the theoretical models are made by a program called TinyTim. However, for high contrast analysis (e.g. quasar host galaxies), theoretical PSFs are usually not ideal because the profile may not be realistic, or the PSF of the data may change with time and position on a detector. So ideally one should prefer to extract PSFs empirically from the data images (which have also undergone the same image processing that may distort the PSF) over using theoretical models. However, doing so from data is not always easy because there are usually few, if any, stars that are bright, unsaturated, or isolated. Therefore one often has to make do with faint stars, or come up with a hybrid technique that combines analytic models with empirical data. For instance, if there are only faint stars, it might possible to create a high quality PSF from the data by adding them up to increase the signal-to-noise (S/N). Or one can also fit analytic models to the stars themselves and use the analytic model as the PSF which would then have infinite S/N. A third way would be to take a theoretical model, and convolve it with one or more functions to try and match the faint stars in an image. The details of how to do so will be explained below.

What PSF size should I use?

Why are the errorbars / uncertainties quoted in "fit.log" often so small?

Getting realistic errorbars from image fitting is often a tricky business. For errorbar values to be realistic, the requirements are that: (1) the residuals after subtracting away the model are purely due to Poisson noise, i.e. the model is a perfect fit to data everywhere in the image, and (2) the noise has a Gaussian distribution around the best fit model. Both of these two criteria are almost never met in galaxy fitting, whether because the models are rarely ever ideal for the data, the sky is rarely ever perfectly flat, or there are stars in the image, neighboring contamination, correlated noise, etc.. The size of the errorbar is related to how much χ2 changes when one tweaks a parameter αi by ε around the best fitted value, then allowing for other parameters (often correlated to some extent) to readjust to compensate, i.e. to minimize the residuals at the new value αi+ε. The amount of tweak ε to increase δχ2 = 1 is formally defined as the 1-σ errorbar (σi = ε). When a model is non-ideal, a small change in a parameter increases χ2 but other parameters may not be able to compensate to reduce the the residuals. The consequence is that it takes only a small amount of perturbation to increase χ2 by 1, which is the reason behind why errorbar values may be too small.

There are other reasons for why errorbars may be too small besides imperfect models, such as image scaling, or PSF mismatch (another form of imperfect model). And as always, it is good to try to identify and understand the different factors separately:

How do I fit or interpret the Fourier modes?

How do I fit spiral structures?

Why can't I fit for the total luminosity for truncation (damping) models?

How do I obtain the total luminosity for truncation (damping) models?

Why does the GALFIT iteration, i.e. "countdown", clock keep bouncing up and down?

Why does IRAF display give "ERROR: FXF: inherit not permitted when PHU has NAXIS != 0"?

If the problem is with displaying a GALFIT image block, solve by doing:

hedit imgblock.fits[1] inherit delete+ update+ ver-

However, to solve the problem more generally, you should delete the inherit keyword from the original image.

Go back to GALFIT Home Page.
Go back to Chien Peng's Home Page.

Chien Peng ()