How Sensors Work

D7000 sensor.jpg

From the number of emails I receive and posts on various digital camera user groups, it seems that many digital camera users still don't know how the primary function of their camera works. It doesn't help to have representatives from camera companies: 

  • Make untrue claims ("Our camera doesn't interpolate, you get the real pixels")
  • Obscure details in marketing claims ("Our camera produces 12.1 effective megapixels")
  • Fail to understand basic optic theory ("The smaller sensor size produces 1.5x magnification, making all your lenses longer in focal length")
  • Make misleading statements about competitors ("They use a CMOS sensor, which is noisier than a CCD sensor")
  • Make misleading statements about themselves ("Our sensor doesn't produce moire because it isn't a Bayer pattern")
  • Make claims that most people don't have the knowledge to interpret correctly ("Industry leading 88db signal-to-noise ratio")

The list is seemingly endless.

It's important to understand how sensors function if you want to get the best possible results from your camera. So I'll step you through what happens, and the issues associated with a number of practical problems you'll encounter. 

A camera focuses light on a plane behind the rear element of the lens. In 35mm cameras, that plane contains film. In digital cameras, the plane is occupied by a piece of silicon (chip) which I'll refer to as the sensor. Sensors historically have come in two primary types: CCD (charge-coupled device) and CMOS (complimentary metal oxide).

A few words about CCD versus CMOS before we continue: 

CCD is an older and mature technology. It is easy to design and produce. Most makers have moved away from it for a variety of reasons, though, as the advantages of CCD are becoming outclassed by what engineers are doing with CMOS. CCD does have the advantage of using line or frame transfers, the latter of which has some temporal advantages for video. CCD sensors have slightly more tendency towards blooming (letting electrons escape from one location to a neighboring one). Again, CCD is a mature technology. Most of the major advances we've seen in CCD tech occurred long ago. 

CMOS has the advantages of sometimes being less expensive to manufacture in quantity but more importantly it is able to contain more complex internal electronics at each individual sensing area (called a photosite). The latest CMOS sensors—for example the Sony EXMORs—have exceedingly low read noise and very large dynamic range. In general it takes more money and a longer time to design a CMOS sensor, but then you get the advantage of lower cost and higher capability. Long term, CMOS or some variation of it is where you'll see most sensors going. But CMOS does have some issues. One is that you need very fast support circuitry in order to avoid issues such as rolling shutter. Because CMOS addresses pixels directly, most CMOS sensors don't capture a "frame" of video simultaneously and offload that to external circuity. Instead, rows of pixels are read sequentially, meaning the top of the frame occurs ever so slightly sooner than the bottom of the frame in video captures. 

Nikon has shown yet another variation in sensor design: a CMOS variant that uses a different transistor type (the LBCAST sensor used in the D2h and D2hs), so we will continue to see a stream of new acronyms as technology and material use moves inexorably on in the low levels of each sensor. On the other hand, some terminology is faux: the so-called LiveMOS sensors used in m4/3 cameras are really nothing more than CMOS with a fancy marketing name. Personally, I don't put much weight into whether a product is CCD or CMOS. I've seen good and bad iterations of both. 

With Kodak now out of the sensor business, one of the key sources of large CCD sensors has temporarily dried up. All Nikon DSLR and Nikon 1 sensors are CMOS. Most of the Coolpix sensors are CMOS as well. All of the Canon DSLR sensors are CMOS, and most of their compact cameras use CMOS. As I write this (early 2012), Leica is the primary remaining supplier of mostly CCD-based cameras (M8, M9, S2), though they too are now using CMOS on newer cameras.

On a sensor are an array of light-sensitive spots, called photosites. Photosites are usually square in shape (there have been two major exceptions that I'll deal with in a moment), and laid out in rows and columns. 

The first thing that catches newcomers to digital cameras unaware is this: the light-sensing portion of the photosites—called the photo diode—does not necessarily cover the entire surface area of the sensor. With some sensors the active light gathering area has been as little as 25% of the total surface area of the chip (called the "fill factor"). Yes, that means that there can be non-light responsive space between adjacent photo diodes. (Pictures you might see of sensors that show adjacent red, blue and green positions are usually of an array of microlenses and filters that lie on top of the actual sensor, and not the photosite and embedded photo diode.)

In recent times we've had BSI (back side integration) sensors, where the fill factor is higher as the light photons don't have to go down deep into the sensor to be changed into electrons. But BSI generally is most useful on small sensors (phones, compact cameras), so we haven't seen it on large sensor cameras (DSLRs).

I noted that there have been exceptions to the square photosite rule. The most important for Nikon users are:

  • The Nikon D1x. The D1x takes two adjacent square photosites and doubles them into a rectangular photosite. 
  • The Fujifilm SuperCCD (used in the Fujifilm S1, S2, S3, and S5 bodies). The SuperCCD uses quasi-octagonal photosites. Fuji further places the "grid" for the photosites at an angle, though for all practical purposes it still has rows and columns of photosites, just oriented at a 45 degree angle.

You might wonder just how large the individual photosites are. On the original D1, they were 11.8 microns square, which is quite large (though technically, this was a group of four "binned" photosites that worked together). On a Coolpix 990, they were 3.45 microns square, which was considered small then, but would be considered large for current compact cameras. The tendency has been towards smaller photosites, partly because photo diode efficiencies have gotten better. Thus, current DSLRs tend to have photosites in the 4 to 8 micron square range, while current compact cameras like the Coolpix can have photosites as small as about 2 microns square (some camera phones have even smaller photosites, ranging down to 0.9 microns). 

Note that it's the area that's important. A 3 micron photosite has 9 square microns of area (of which only a portion may be sensitive to light). An 8 micron photosite has 64 square microns of area, or almost an order of magnitude more area. That turns out to be fairly important.

Dark Current and Well Overflow

A photosite essentially converts the energy from a light wave into photo-electrons. Light is actually what physicists call a "wave-particle duality." The energy in the light, which is what we're trying to collect, resides in particles called photons. The longer a photosite is exposed to light, the more photons are converted into photo-electrons via the photo diode at each photosite. One photon = one photo-electron maximum (you can't gain energy in the transfer in current designs). To some degree, photo diode size is directly related to effective ISO sensitivity, as a larger surface area exposes it to more light in any given amount of time than a smaller surface would. You'll note that the larger photosite DSLRs tend to have base ISOs of 200 and the smaller photosite DSLRs tend to have a base ISO of 100, for example. 

The physical size of the individual photosites is important beyond effective ISO. The larger the active light gathering surface, the less certain types of noise can be a problem. That's because every piece of silicon has a baseline level of electron "action" (current). In sensors, this current is usually called Dark Current or Dark Noise (the "dark" in the name implies that the current was formed despite no exposure to light). (There are actually several different underlying types of on-chip noise, but for simplification, I'll just refer to Dark Current in this article.)

Dark Current increases with temperature. This is due to the small gap between the valence and conduction bands within the silicon: the gap is so small that higher temperatures cause more electrons to cross the gap to where they don't belong. Fortunately, it takes really hot temperatures to increase Dark Current to visible, troublesome noise (typically 90 degrees Fahrenheit or higher coupled with long shutter speeds, at least for the smaller Sony sensors used in the Coolpix and most other consumer cameras). At very long shutter speeds (usually 1 second or longer) some of this electron activity can also result in "hot pixels," essentially generated by photosites that prove "sticky" to those wandering electrons due to impurities in the silicon. The longer the shutter speed or higher the temperature, the more likely you'll see some hot pixels in your image. 

Every digital camera attempts to deal with dark current by "masking off" a set of photosites so that they don't see light (which is part of the explanation why your supposedly 3.34-megapixel camera only produces images with only 3.15 megapixels). Your camera's brains compares the values it sees from photosites that weren't exposed to light to those that were. Dark Current is partially random. So, in the most simplistic form, the camera averages all the values found in the masked off photosites and subtracts that from the values seen by the photosites exposed to light to remove the Dark Current. Fortunately, better algorithms have been developed over the years, and some sensors are quite good at using this masked area to remove underlying noise. 

Many current production digital cameras go further than that, however. Individual photosites can and do have slightly different responses to light and to current, so many modern cameras do something a bit different on long exposures: they take two pictures with the photosite array, one exposed to light and one not (usually called Long Exposure Noise Reduction). Then the pattern seen in the exposure without light is subtracted from the one for the exposure exposed to light. (You can do this yourself, by the way. When you take a picture in low light with long shutter speeds, especially in warmer temperatures, put the lens cap on and take another shot at exactly the same shutter speed. In Photoshop you can use the second exposure to remove patterned noise from the first. But make sure that your dark current exposure is taken at the same temperature as the first! I've seen people take their first photo outside at night in the cold, then bring the camera to the warmer indoors while the second dark current exposure is being made. That won't work: the two exposures need to be done at the same temperature.)

At the other end of the spectrum, what happens to a photosite when it contains too many photo-electrons (due to too much exposure to light)? Well, if left to its own devices, the information (electrons) can spill from one photosite to another, corrupting the data in the adjacent site (a concept called "blooming," or well overflow). This is especially true in the physically small photosites of the Sony sensors used in older Coolpix models (proximity makes it easier for an electron to escape from its current owner to another). Most sensors have "drain" circuits that attempt to remove excess electrons before they degrade the chip's data too badly, but these circuits are far from perfect, and it's possible to overload them, as well.

I've been speaking about something without really identifying it: the electron well in the photosite. The photo diode in the photosite converts the light photons to electrons, but the photosite needs somewhere to store those electrons until the sensor is asked to produce the "value" for each photosite. It does this in an electron well buried in the photosite. Just like the size of the photo diode varies in sensors, so does the size of the electron well. Compact cameras have smaller wells than DSLRs, for example. Let's just use some arbitrary numbers to see why that's important.

Consider a compact camera photosite well that can hold 10,000 electrons versus a DSLR well that can hold 100,000 electrons. If the baseline noise level within the sensor itself is 100 electrons, then the signal to noise ratio in the compact camera has a maximum value of 100:1 while the DSLR's signal to noise ratio is 10 times better. That's just one reason why the compact cameras produce more noise—all else equal—than do DSLRs. 

Sensor design is old enough now that we're bumping against some physical limits in almost all the areas just mentioned. Photo diodes are getting to be about the maximum size they can be within the photosite without some new technological breakthrough. Baseline noise is about as low as can be mass produced with current materials. Electron well sizes are about maximized for the current photosite sizes. You can make some adjustment to one of the variables, but it tends to make you have to reduce one of the other variables. Thus, we currently see much more emphasis on post processing the sensor data produced and getting cleaner results by taking out problems after the fact. 

However, that doesn't mean we're done with breakthroughs, only that it takes a complete rethink of an element to make significant progress. One such thing is to reverse the orientation of the photo diode so that it's on the surface rather than buried further down in the silicon (sometimes called a "backlit sensor", or more commonly referred to as BSI). Different materials—often more expensive and more difficult to work with—can change the baseline responses. Better microlenses and filtration can get more of the original light to the photo diode itself. We still have plenty of room for improvement, and I think we'll continue to see a similar level of improvements over the next four years that we saw in the previous four. Further out than that is difficult to predict, as it would take some new breakthrough technologies to continue the improvement curve further.

Your Digital Camera Sees in Black and White

It may surprise you to find out that the sensor in your camera reacts to all light with relative equality. Each individual photosite simply collects only the amount of light hitting it and passes that data on; no color information is collected. Thus, a bare sensor is a monochromatic device.

Plenty of ways exist to make monochromatic information into color data. For example, you could split the light coming through the lens to three different sensors, each of which was tuned to react to a certain light spectrum (some video cameras do that). But most digital still cameras use a different method: they place an array of colored filters over the photosites. One filter arrays is commonly used, and several others are possible:

  • RGBG (Bayer). This array arrangement usually involves odd-numbered rows of photosites covered by alternating red and green filters, with even-numbered rows covered by alternating green and blue filters. Called the Bayer pattern after the Kodak engineer that invented it, this filter array uses the primary colors of the additive mixing method (used in television and computer monitors). One unique aspect of the Bayer pattern is that each "color" cell has two of each of the alternatives as its neighbors. Most current digital cameras use a Bayer pattern array. This is the method used in the Canon and Nikon DSLRs. 
  • Diagonal color. One common alternative Bayer arrangement, called the diagonal color pattern, is where each row has repeating RGB elements and each row is staggered by one element (i.e., the first row is RGBRGBRGB..., the second row is GBRGRBRBR..., the third row is BRGBRGBRG...), but it is not currently used in any digital camera I know of. The Fujifilm X-Pro1 and other cameras using the X-Trans technology use a relative of this pattern that has less color information than luminance, but it's not exactly a diagonal pattern.
  • Add a Color. There are some very sophisticated patterns of R, G, and B filtration where the second green is replaced by something else. Sony tried emerald, and Kodak had a version with no filtration on that position. You need fairly high pixel counts to mask the low-level artifacts caused by these patterns, but you can tune them for either color discrimination or luminance gathering. 
  • CYMG. Alternatively, a slightly more complex filter array related to both Bayer and Add a Color uses the primary colors in the subtractive process (commonly used in printing technologies) plus green. This was the method used in most early Coolpix models beginning with the 900 series (i.e., the 885, 995, 2500, 4500, 5000, and 5700 all use this pattern). CYMG is typically used on sensors that are sensitive to noise at low light levels, as the dyes used to create the CYM colors are lighter than RGB and thus let more light through to hit the photosites.

Each of these methods has advantages and disadvantages. The repeat of the green filter in Bayer patterns (and addition of a green filter to the subtractive CYM method) is due partly to the fact that our eyes are most sensitive to small changes in green wavelengths. By repeating (or adding) this color in the filter, the accuracy of the luminance data in the critical area where our eyes are most sensitive is slightly improved.

So, each individual photosite has a filter above that limits the spectrum of light it sees. Later in the picture-taking process, the camera integrates the various color information into full-color data for individual pixels (a process sometimes called interpolation, but more accurately called demosaicing).

But one important point should be made: the color accuracy of your digital camera is significantly influenced by the quality of the filter array that sits on top of the photosites. Imagine, for a moment, a filter array where each red filter was slightly different—you'd have random information in the red channel of your resulting image. A damaged filter would result in inaccurate color information at the damage point.

One thing that isn't immediately apparent about the Bayer pattern filter is that the ultimate resolution of color boundaries varies. Consider a diagonal boundary between a pure red and a pure black object in a scene. Black is defined as the absence of light reaching the sensor, thus the data value would be 0 (for the G and B photosites). That means that only the photosites under the red sensors are getting any useful information! Fortunately, pure red/black and blue/black transitions don't occur as often as you'd think, but it is something to watch out for. (Since no individual color is repeated in a CYMG pattern, all boundaries should render the same, regardless of colors.)

Most sensors these days are built with microlenses that incorporate the filter pattern below them and sit directly on top of the photosites. This microlens layer not only incorporates the Bayer filter pattern just underneath it, but redirects light rays that hit at an angle to move more perpendicular to the photosites. If light were to hit the photosites at severe angles, not only would the photosite be less likely to get an accurate count of the light hitting it, but adjacent cells would tend to be slightly more influenced by the energy since the filters sit above the photosites and have no "guards" between them. All Nikon cameras currently use microlens layers; the old Kodak DCS Pro 14n is unusual in that it didn't.

On top of the microlenses are yet another set of filters that take out the ultraviolet (UV) and infrared (IR) light spectrum and provide anti-aliasing (I'll discuss anti-aliasing in the next section). Current cameras allow very little light outside the visible spectrum to get to the photosites, though many older ones often let significant IR through.

We have one more exception to talk about (sensors have gotten complicated since I first wrote about them in the 1990's). That's the Foveon sensor (now owned by Sigma, the only cameras that use it). Unlike the Bayer-pattern sensors that get their color information by using adjacent photosites tuned to different spectrums, the Foveon sensor uses layers in its photosite design. The primary benefit of this approach is that it gets rid of color aliasing issues that occur when you demosaic Bayer-pattern data, and thus allows you to get rid of (or at least lower the value of) the antialiasing filter over the sensor. The benefit can be described in two words: edge acuity. Another benefit is that there is no guessing about color at any final pixel point, which means that colors are robust and accurate. The primary disadvantage to this approach has to do with noise. Obviously, less light gets to the bottom layer of each photosite than the top layer. Foveon has done a remarkably good job of mitigating the drawbacks while emphasizing the positive in the latest iteration of the sensor. 

Getting Data Off the Sensor

At this point, we have an array of filtered photosites that respond to different colored light that usually looks something like this:

The data at each of the individual photosites, by the way, is still in analog form (the number of electrons in the well). The method by which that data is retrieved may surprise you, however: in most CCD sensors the values are rapidly shifted one line at a time to the edge of the sensor. This process is called an interline (or row) transfer, and the pathways that the data moves down are one of the reasons why photosites have space between them (to make room for the pathway). While the data is moved off in "rows," it's important to note the short axis is usually the direction that the data is moved (if you're looking at a horizontal image, you'd see these as columns). (CMOS sensors, such as those used in the Canon, recent Nikon, Kodak Pro DSLRs, and a few other cameras are unique, in that the data for each individual sensor can be retrieved directly.)

As the data moves to the edge of the sensor, it is usually first processed to reduce noise, then read by A/D converters (ADC). Now what we have are a series of digital values (8-bit for many Coolpix and consumer cameras, 12-bit to 16-bit for most SLR models). On many recent CMOS designs (starting with the Nikon D90/D300, for example), the ADC function is actually built into each photosite. This tends to reduce "read noise" because the ADC is adjacent to the electron well being counted and thus transmission errors don't come into play.

One common misconception is that bit depth equates to dynamic range (the range of dark to bright light that can be captured). This isn't true. Dynamic range of a camera is determined mostly by the sensor (electron well capacity minus baseline noise determines the maximum range of exposure tolerated; another reason why larger photosites are better than small). If you put a 4-bit, 8-bit, 12-bit, and 16-bit A/D converter on the same chip, the sensor wouldn't respond to low or bright levels of light any differently; you'd only get more or less tonal definition in the conversion. 

At this point in a Bayer sensor, we have one-third the data we need for a complete color picture (we need red, green, and blue values at each photosite location, and we have only one of those values from each photosite). Here comes the tricky part: a processor (a Sparc-based computer in many early Coolpix models, dedicated proprietary circuits in most new cameras, called the EXPEED engine by Nikon and DIGIC by Canon) looks at blocks of this data and tries to guess the actual RGB color value of each pixel by comparing adjacencies! (The demosaicing I mentioned above. A simple demosaicing routine might work this way: (1) record the existing R, G, or B value at each pixel position; (2) invent new G values at each of the R and B photosite positions, often using a multiple pass technique to figure out where edges occur; and (3) fill in the missing R and B values using neighbor sampling techniques. Hundreds of more sophisticated variants are now used, with most trying to deal with the minor artifacting issues created by the simple routines.)

Camera manufacturers are extremely secretive about their demosaicing methods. But given the unclassified data on image processing and the fact that virtually all cameras are pressed for computational power when confronted with huge amounts of data, they all tend to do similar things with near-neighbor lookups. You should know a couple of things about demosaicing (we're about to talk about anti-aliasing, which I promised earlier):

The process of reconstructing data at a "frequency" (sampling rate) less than the original produces aliasing. What that means is that the reconstructed data may not be a correct record of the original. Let's pretend our sensor is black and white only for a moment. Imagine a series of vertical black lines with white space between them. If each black line fell on one column of photosites and each white space fell on the column of photosites next to it, it should be obvious that we can capture that level of detail perfectly. But what happens if each black line falls partially on a second column of photosites? Instead of recording white, those photosites would record "gray" (part white, part black). Earlier I mentioned that the filter on top of the microlens Bayer pattern took out IR and provided anti-aliasing. Well, what anti-aliasing filtration does is get rid of the highest frequency detail, which would tend to produce the problems we just talked about. Unfortunately, anti-aliasing filters have the net effect of making the level of detail rendered appear "softer" than they would otherwise. But they also insure that the worst artifacts associated with analog-to-digital sampling aren't encoded into the Bayer pattern data. The higher the resolution of the sensor, the less the need for anti-aliasing filtration; the 13.5mp Kodak Pro 14n didn't have an anti-aliasing filter, for instance. Technically, the D3x shouldn't need one, but it does have one. Nikon now gives us the option on the 36mp D800. 

One of those artifacts associated with sampling frequency and demosaicing is that of moire patterns (sometimes called color noise). A moire pattern occurs when aliasing occurs on a highly detailed area. Moire can be partially removed by using complex math (involving what's known as the Nyquist frequency), but there's a real battle going on inside your digital camera between the speed at which images are processed and the amount of data the camera has to deal with. Most cameras saving into JPEG format don't do much, if any, moire processing, and rely more upon the anti-aliasing filter to reduce this artifact before the data is encoded. A few cameras, the Kodak DCS Pro 14n or D800E, for example, don't use an anti-aliasing filter. Images from those cameras tend to be slightly "sharper" and contain slightly more detail than others of the same pixel count, but this is at the expense of possible added color noise and stairstep artifacts.

Sharpness, contrast, and other camera settings may be applied during the demosaicing step or immediately afterwards, depending upon the camera's design. And JPEG compression is yet another variable that enters into the picture. Each additional manipulation of the underlying photosite data gets us a little further from the original information and introduces the potential for artifacts. In essence, by the time the camera is done with all its processing, it is impossible to reconstruct the original data (exception: most modern DSLR cameras have the ability to save the actual photosite data from the camera in a raw file, for later demosaic on your computer).

Those using older DSLRs should know a couple of things about a few cameras: 

  • On Fujifilm's Web site, they make the contention that their SuperCCD does not interpolate to get the 6-megapixels of data the S1 produces from its 3-megapixel sensor (or 12 megapixels from 6 on the S2 Pro, S3 Pro, or S5 Pro). They claim that, because of the angular nature of their array (think of a baseball diamond where each of the bases is a photosite), that they already have the X and Y values that can be used to build the intermediaries (the pitcher's mound in this pattern). Sorry, Fujifilm, but that's still interpolation.
  • The D1x is a very unique camera as regards demosaicing. Most interpolation of photosite data is what is called up sampling. This means that you sample existing data to produce additional data (that's what demosaicing does, for example: you have a G value and you look at the other data around it to come up with the R and B values for that G position). You can also down sample, which would be to produce less data than the original contains. The D1x upsamples the short axis and downsamples the long axis to produce its in-camera pictures. NEF conversion products, such as the no-longer made Bibble and versions of Capture later than 3.5, can also produce images that don't downsample the long axis while upsampling the short axis, producing far larger files.) Why Nikon chose to do the in-camera up/down sampling is unclear. One would think that you wouldn't want to downsample sensor data normally, but the fact that Nikon does so even in their default RAW processing seems to indicate that Nikon knows something about the validity of the data from those split photosites that we don't. It could be, for example, that Nikon downsamples that data to deal with a noise issue. Or it could be that there's a short-cut to the number-crunching that must done to generate the full RGB image if they downsample the long axis. The reason must be a good one, however, because moire and related artifacting is dramatically lessoned if sampling is done on only one axis rather than two.

Best Book for Photographers

While I was browsing my bookshelf double-checking some of my material in this report, I pulled out The Manual of Photography Tenth Edition, the highly technical and math-filled volume that defines much of the state-of-the-art. MoP is highly recommended, by the way. It's one of those books that you pull out and read sections of from time to time when you want to know the underlying theory behind something, like depth of field or fast Fourier transforms. Support this site by buying it from the following affiliate link:

Much has been said about when (or whether) digital imaging passes film in the ability to resolve information. I came upon this interesting passage in MoP Ninth Edition (the culmination of several paragraphs of theory and math): "It would appear that the digital system has overtaken the photographic process with respect to this [information capacity of images] measure of performance." Note that this says nothing about resolution, only the theoretical amount of "information" contained in an image.

text and images © 2017 Thom Hogan
portions Copyright 1999-2016 Thom Hogan-- All Rights Reserved
Follow us on Twitter: @bythom, hashtags #bythom, #dslrbodies