|
SQL is the Structured Query Language, a standard means of asking for data from databases, and is used to query the Catalog Archive Server (CAS). This page provides a brief overview of SQL. Query examples are also available, with comments, as well as a page of links to more detailed off-site documentation.
Database Fundamentals
|
| a | b | a AND b | a OR b |
|---|---|---|---|
| TRUE | TRUE | TRUE | TRUE |
| TRUE | FALSE | FALSE | TRUE |
| TRUE | NULL | NULL | TRUE |
| FALSE | FALSE | FALSE | FALSE |
| FALSE | NULL | FALSE | NULL |
| NULL | NULL | NULL | NULL |
When comparing values, you will use the COMPARISON operators:
| Operator | Description |
|---|---|
| < | less than |
| > | greater than |
| <= | less than or equal to |
| >= | greater than or equal to |
| = | equal |
| <> or != | not equal |
In addition to the comparison operators, the special BETWEEN construct is available.
a BETWEEN x AND y is equivalent to
a >= x AND a <= y
Similarly,
a NOT BETWEEN x AND y is equivalent to
a < x OR a > y
Finally, the MATHEMATICAL operators (both numeric and bitwise) are:
| Name | Description | Example | Result |
|---|---|---|---|
| + | Addition | 2 + 3 | 5 |
| - | Subtraction | 2 - 3 | -1 |
| * | Multiplication | 2 * 3 | 6 |
| / | Division | 4 / 2 | 2 |
| % | Modulo (remainder) | 5 % 4 | 1 |
| POWER | Exponentiation | POWER (2.0,3.0) | 8.0 |
| SQRT | Square root | SQRT (25.0) | 5.0 |
| ABS | Absolute value | ABS (-5.0) | 5.0 |
| & | Bitwise AND | 91 & 15 01011011 & 00001111 | 11 00001011 |
| | | Bitwise OR | 32 | 3 00100000 | 00000011 | 35 00100011 |
| ^ | Bitwise XOR | 17 # 5 00010001 # 00000101 | 20 00010100 |
| ~ | Bitwise NOT | ~1 | -2 |
| AVG | Average | AVG(ModelMag_r) | |
| MIN | Minimum | MIN(ModelMag_r) | |
| MAX | Maximum | MAX(ModelMag_r) | |
| LOG | Natural Logarithm | LOG(petroMag_r) | |
| LOG10 | Base-10 Logarithm | LOG10(petroMag_r) | |
| EXP | Exponential Value (ex) | EXP(2.5) | 12.182494 |
In addition, the usual mathematical and trigonometric functions are available in SQL, such as COS, SIN, TAN, ACOS, etc..
Several SDSS tables contain bit-encoded flags to indicate various types of information about the object or quantity in question (e.g., the PhotoObjAll table and the PhotoTag view each have the flags column, SpecObj has zWarning and various targeting flags etc.).
One of the most important uses of bit flags is to indicate why an object was targeted for spectroscopy. A list of spectroscopic target flags is available on the Spectroscopic Target Flags page.
This section describes how you can test for flag values in your query. For sample queries that demonstrate the use of flags, see the Using flags, Diameter limited sample, LRG sample, Clean photometry with flags - Stars, and Clean photometry with flags - Galaxies sample queries for examples on how to use flags.To return rows for which the flag is set, the basic syntax for the constraint is:
For example, to select objects for which the BLENDED flag is set in PhotoTag, you would use a query like:
SELECT top 10 objid, flags FROM PhotoTag
WHERE flags & dbo.fPhotoFlags('BLENDED') > 0
|
and to select only objects for which the flag is NOT set, use
SELECT top 10 objid, flags FROM PhotoTag
WHERE flags & dbo.fPhotoFlags('BLENDED') = 0
|
To select objects for which all of several flags are set, generate the combined bitmask by adding the individual flag bitmasks, then compare the result of ANDing the combined bitmask with the flag column with the combined bitmask itself, e.g.,
SELECT top 10 objid, flags FROM PhotoTag
WHERE
( flags & (dbo.fPhotoFlags('NODEBLEND')
+ dbo.fPhotoFlags('BINNED1')
+ dbo.fPhotoFlags('BINNED2')) )
= ( dbo.fPhotoFlags('NODEBLEND')
+ dbo.fPhotoFlags('BINNED1')
+ dbo.fPhotoFlags('BINNED2') )
|
To select objects for which at least one of several flags is set, you just need to check that ANDing the combined bitmask with the flag column returns a non-zero result, e.g.,
SELECT top 10 objid, flags FROM PhotoTag
WHERE
( flags & (dbo.fPhotoFlags('NODEBLEND')
+ dbo.fPhotoFlags('BINNED1')
+ dbo.fPhotoFlags('BINNED2')) ) > 0
|
To select objects for which none of several flags is set, the result of ANDing the flag column with the combined bitmask must be 0, e.g.,
SELECT top 10 objid, flags FROM PhotoTag
WHERE
( flags & (dbo.fPhotoFlags('NODEBLEND')
+ dbo.fPhotoFlags('BINNED1')
+ dbo.fPhotoFlags('BINNED2')) ) = 0
|
As mentioned above, if you are running a query that is expected to match a large number of rows (millions), it is better
to first obtain the binary bitmask resulting from the multiple flag arithmetic and using that single bitmask instead of
repeated function calls to the flag functions, as described in the Using dbo
functions in your query subsection of the Optimizing Queries section below.
The SDSS photo pipeline sets a number of flags that indicate the quality of the photometry for a given object in the catalog. If you desire objects with only clean photometry for science, you should be aware that you need to filter out unwanted objects yourself in your query. This is not done automatically for you (e.g. with a view of the PhotoObjAll table). The main reason is that the flag constraints that are required for this filtering often impose a significant performance penalty on your query .
There is a single up or down flag that is available in the PhotoObjAll table (and its views) called "clean" that is
set to 1 if the photometry meets our definition of good photometry, and the use of this shorthand flag is illustrated
in the Clean Photometry sample query. This is meant to provide a simple way to select
objects with clean photometry. However, if you do not trust this or you want to be more specific and use the
individual photo flags to select objects that meet your criteria for "clean photometry", this is illustrated in two other
sample queries: the Clean photometry with flags - Stars, and
the Clean photometry with flags - Galaxies sample queries.
As mentioned in the EDR Paper, the database designates quantities that are not calculated for a particular object in a table with special values, as follows:
SELECT ra,dec,u,err_u FROM PhotoObj WHERE ra BETWEEN 180 AND 181 AND dec BETWEEN -0.5 AND 0.5 AND u BETWEEN -9999 AND 20.0 -- or "u > -9999 AND u < 20.0", -- instead of just "u < 20.0" AND err_u BETWEEN -1000 AND 0.1 -- or err_u > -1000 AND err_u < 0.1, -- instead of just "err_u < 0.1" |
Use the STR(column,n,d) SQL construct (where n is the total number of digits and d is the number of decimal places) to set the precision of the column that your query requests. The SkyServer returns values with a default precision that is set for each data type, and this may not be enough for columns like ra, dec etc. See the Selected neighbors in run or the Uniform Quasar Sample sample queries for examples of how to use STR.
You may wish to obtain quantities from multiple tables, or place constraints on quantities in one table while obtaining measurements from another. For instance, you may want magnitudes (from PhotoObj) from all objects spectroscopically identified (SpecObj) as galaxies. To perform these types of queries, you must use a join. You can join any two (or more) tables in the databases as long as they have some quantity in common (typically an object or field ID). To actually perform the join, you must have a JOIN subclause in the FROM clause of your query that specifies the common quantity to be equal in the two tables. Here is an example, getting the g magnitudes for stars in fields where the PSF fitting worked well:
SELECT TOP 10 s.psfMag_g FROM Star s JOIN Field f ON s.fieldID = f.fieldID WHERE s.psfMag_g < 20 AND f.pspStatus = 2 |
Notice how we define abbreviations for the table names in the FROM clause; this is not necessary but makes for a lot less typing. Also, you do not have to ask for quantities to be returned from all the tables. You must specify all the tables on which you place constraints (including the join) in the FROM clause, but you can use any subset of these tables in the SELECT. If you use more than two tables, they do not all need to be joined on the same quantity. For instance, this three way join is perfectly acceptable:
SELECT TOP 10 p.objID,f.field,z.z FROM PhotoObj p JOIN Field f ON f.fieldid = p.fieldid JOIN photoz z ON p.objid = z.objid WHERE f.psfWidth_r > 1.2 AND p.colc > 400.0 |
The type of joins shown above are called inner joins. In the above examples, we only return those objects which are matched between the multiple tables. If we want to include all rows of one of the tables, regardless of whether or not they are matched to another table, we must perform an outer join. One example is to get photometric data for all objects, while getting the spectroscopic data for those objects that have spectroscopy.
In the example below, we perform a left outer join, which means that we will get all entries (regardless of matching) from the table on the left side of the join. In the example below, the join is on P.objID = s.BestObjID; therefore, we will get all photometric (P) objects, with data from the spectroscopy if it exists. If there is no spectroscopic data for an object, we'll still get the photometric measurements but have nulls for the corresponding cpectroscopy.
select top 100 P.objID, P.ra, P.dec, S.SpecObjId, S.ra, S.dec from PhotoObj as P left outer join SpecObjAll as S on P.objID = s.BestObjID |
When using table valued functions, you must do the join explicitly (rather than using "="). To do this, we use the syntax
SELECT quantities
FROM table1
JOIN table2 on table1.quantity = table2.quantity
WHERE constraints
For instance, in the example below, we use the function dbo.fGetNearbyObjEq to get all objects within a given radius (in this case, 1') of a specified coordinate. This is a table-valued, so it returns a table, containing the ObjIDs and distances of nearby objects. We want to get further photometric parameters on the returned objects, so we must join the output table with PhotoObj.:
SELECT G.objID, GN.distance
FROM Galaxy as G
JOIN dbo.fGetNearbyObjEq(115.,32.5, 1.0) AS GN
ON G.objID = GN.objID
WHERE (G.flags & dbo.fPhotoFlags('saturated')) = 0
|
SQL provides a number of ways to reorder, group, or otherwise arrange the output of your queries. Some of these options are:
SELECT count(*) FROM Galaxy WHERE ra between 180.1 and 180.5 |
SELECT distinct run FROM Field |
Here a COUNT would return different counts depending on the column you selected, e.g.
SELECT count(distinct run) FROM Field |
would return a different number in general from:
SELECT count(distinct field) FROM Field |
SELECT top 10 r FROM Star |
SELECT top 10 u,g,r FROM Star order by g,r desc |
Note how repeatedly executing this query returns the same 100 rows. This is not true of the previous query, especially if you run it at different times so the cache does not come into play..
SELECT min(r),max(r) FROM PhotoPrimary group by type |
SELECT count(r) FROM PhotoPrimary group by type |
It is easy to construct very complex queries which can take a long time to execute. When writing queries, one can often rewrite them to run faster. This is called optimization.
The first, and most trivial, optimization trick is to use the minimal Table or View for your query. For instance, if all you care about are galaxies, use the Galaxy view in your FROM clause, instead of PhotoObj. We have also created a 'thin' version of PhotoObjAll, called PhotoTag. This vertical subset contains all the objects in PhotoObjAll, but only a subset of the measured quantities. Using the PhotoTag view to speed up the query only makes sense if you do NOT want parameters that are only available in the full PhotoObjAll.
It is extremely useful to think about how a database handles queries, rather than trying to write a plain, sequential list of constraints. NOT every query that is syntactically correct will necessarily be efficient; the built-in query optimizer is not perfect! Thus, writing queries such that they use the tricks below can produce significant speed improvements.
Another simple way to make queries faster is to use indexed quantities to search on. There are two types of indices in the CAS, indices built into the database, and an external spatial index that we have added to make spatial searches much faster, called the Hierarchical Triangular Mesh.. The latter is explicitly invoked by using the built-in spatial search functions like fGetNearbyObjEq (does a radial search in equatorial coordinates), or fGetObjFromRectEq (searches in a rectangular area). Database indices are invoked automatically when you include columns in your search (in the WHERE clause) that have indices built on them. There are 3 types of database indices:
If you must search on non-indexed columns in addition to the indexed ones, you can still benefit by first performing a query using only the indexed quantities, and then select those parameters from the returned subset of objects. An indexed quantity is one where a look-up table has effectively been calculated, so that the database software does not have to do a time-consuming sequential search through all the objects in the table. For instance, sky coordinates cx,cy,cz are indexed using a Hierarchical Triangular Mesh (HTM). So, you can make a query faster by rewriting it such that it is nested; the inner query grabs the entire row for objects of interest based on the indexed quantities, while the outer query then gets the specific quantities desired.
SELECT ...
FROM PhotoObj
WHERE
flags & dbo.fPhotoFlags('BLENDED') > 0
|
In this case, it would be better to first do the pre-query:
SELECT dbo.fPhotoFlags('BLENDED')
|
to get the bitmask value for that flag, and then rewrite the above query as:
SELECT ...
FROM PhotoObj
WHERE
flags & 8 > 0
|
This will avoid the wastefully repeated function call for each and every photobj in the table.This is even more important when you are using multiple flags and you can reduce the comparison to a single bitmask using flag arithmetic. In the final example above in the Querying Bit Flags section, you can replace the original query:
SELECT top 10 objid, flags FROM PhotoTag
WHERE
( flags & (dbo.fPhotoFlags('NODEBLEND')
+ dbo.fPhotoFlags('BINNED1')
+ dbo.fPhotoFlags('BINNED2')) ) = 0
with a more efficient version by first running the following pre-query:
SELECT (dbo.fPhotoFlags('NODEBLEND')
+ dbo.fPhotoFlags('BINNED1')
+ dbo.fPhotoFlags('BINNED2'))
which returns the bitmask value 805306432, which can in turn be substituted back in the
original query as follows:
SELECT top 10 objid, flags FROM PhotoTag
WHERE
( flags & 805306432 ) = 0
so as to save 3 function calls and make the query significantly more efficient. (In this particular example it does
not matter because we are only asking for 10 rows, but if the "TOP 10" were to be removed and the query was run
on millions of rows, it would make a difference).