Objective: Quickly extract the 3D coordinate tuple and 4th-dimensional value from a LONI UCF
Notes: Briefly, UCF coordinate point data is stored as a floating 3-tuple. UCFs can also accommodate 4-tuples, where the fourth dimensional (4D) point can be some descriptive data point, such as cortical thickness, p-value, or beta value. Reading and writing UCFs is typically done in Java. However, for quick analyses, particularly for the 4D data points, using R may be more convenient. The code snippet below extracts these data points from UCFs.
The LONI UCF format is described as
NAME ucf - LONI Universal Contour File format SYNOPSIS #include "/usr/local/lib/loni/ucf.h" DESCRIPTION Files with the ucf extension are LONI Universal Contour Files. These files contain structure outline information with the following features. o Outlines are divided into planes called levels. o Levels may contain any number of closed loops called contours. o Each contour contains a list of consecutive points. o Each point is a floating point 3-tuple. FIELDS The fields in a ucf, normally in this order, are <width=> image_width_in_pixels of the image from which the outlines were made. <height=> image_height_in_pixels of the image from which the outlines were made. Nei- ther width nor height should be needed, but some older software expects them. <xrange=> xlo xhi the real space coordinates of the extent in x for the volume used to draw the ucf. Normally this is in microns. <yrange=> ylo yhi the real space coordinates of the extent in y for the volume used to draw the ucf. Normally this is in microns. <zrange=> zlo zhi the real space coordinates of the extent in z for the volume used to draw the ucf. Normally this is in microns. <levels> number_of_levels contained in the ucf. <level_number=> index_of_level Normally the distance between the sampling plane of the level and the origin. The first declaration for starting a new level. <point_num=> number_of_points the number of points in the ensuing contour. The first declaration for starting a new contour within the current level <contour_data=> first point x, y, z second point x, y, z in all, list of point_num points as set be the previ- ous declaration. <end of level> last line of a level. <end> last line of the ucf. EXAMPLES This is an example of a ucf output by the program maud. The outlines were made in the plane of two original images from the volume. One image had two closed loops, the other had one. <width=> 512 <height=> 512 <xrange=> 0.000000 185000.000000 <yrange=> 0.000000 185000.000000 <zrange=> 1200.000000 165613.390625 <levels> 2 <level number=> 83400.000000 <point_num=> 794 <contour_data=> 61498.046875 86935.546875 83400.000000 61895.507813 86935.546875 83400.000000 [ 791 lines deleted ] 62292.968750 87333.007813 83400.000000 <end of level> <level number=> 141600.000000 <point_num=> 303 <contour_data=> 88127.929688 85743.164063 141600.000000 88525.390625 85345.703125 141600.000000 [ 300 lines deleted ] 90512.695313 90512.695313 141600.000000 <point_num=> 292 <contour_data=> 96474.609375 79383.789063 141600.000000 96474.609375 79781.250000 141600.000000 [ 289 lines deleted ] 96474.609375 78191.406250 141600.000000 <end of level> <end> AUTHOR Brad Payne
Code
# References # 1. http://stat.ethz.ch/R-manual/R-devel/library/base/html/regex.html # Apparently R's regex engine is slightly different from what I'm used to? # 2. http://biostat.mc.vanderbilt.edu/wiki/pub/Main/SvetlanaEdenRFiles/regExprTalk.pdf # 3. http://en.wikibooks.org/wiki/R_Programming/Text_Processing#How_can_I_extract_a_pattern_in_a_string_.3F # 4. http://stackoverflow.com/questions/5237557/extracting-every-nth-element-of-a-vector # 5. http://heather.cs.ucdavis.edu/~matloff/132/NSPpart.pdf # 6. http://stackoverflow.com/questions/8865633/r-data-frame-how-to-control-the-conversion-of-matrix-containing-scientific-nota pmap <- scan("sample.ucf", character(0), sep = "\n") # Extract lines with 4D data (ignore meta data, etc.) # Here, "pmap" contains 67606 character elements. "lines" contains 66049. linepos <- grep("?[0-9][.][0-9]+[E]-*[0-9]{2}[ ]", pmap) lines <- pmap[linepos] # Create a function to split each string using space as the delimiter f <- function (x) strsplit(x, split = " ") # Convert lines to a data frame (while taking care to avoid converting strings # into factors), so we can use "apply" to run the above function over each row. # Applying strsplit creates a unwieldy list of 66049 lists, each containing four # character elements -- our x, y, z, and p. # We proceed to unlist, which produces a vector containing 264196 character # elements (or 66049 * 4). vals <- apply(as.data.frame(lines, stringsAsFactors = F), 1, f) unvals <- unlist(vals) # Now we convert this vector into a 66049x4 character matrix. m <- matrix(unvals, ncol = 4, byrow = T) # However, to work with the values, we want them to be numeric, not character. # We also don't want them to be in scientific notation, so we switch the matrix's # mode to numeric. Converting the matrix into a data frame using "as.data.frame" # completes this exercise. mode(m) <- "numeric" m <- as.data.frame(m) colnames(m) <- c("x", "y", "z", "p") str(m)
Notes:
The code above assumes the UCF data to be formatted in scientific notation. If the data is not in scientific notation, use:
# Extract lines with 4D data (ignore meta data, etc.) # Here, "pmap" contains 67606 character elements. "lines" contains 66049. linepos <- grep("?[0-9]*[.][0-9]+[ ]?[0-9]*[.][0-9]+", pmap, perl = F) lines <- pmap[linepos] lines <- lines[c(-1,-2,-3)]
I haven’t find a working, cleaner regex yet — thus the need to delete the first three elements in the vector (which reflect the x, y, z range info in the UCF header).