Solutions In Perception Challenge
The new solutions in perception site is at http://solutionsinperception.org/
Below is the page for the first contest in May of 2011
Update 23 May, 2011: Pictures!
Update 19 May, 2011: The ground truths for the NIST and Willow Garage data sets have been released. The ground truths have been packaged along with the Python script used to score the output CSV files from your TOD_Stub implementations, and can be found here. A guide for reading and understanding the report files generated by the scoring script is also available.
Update 16 May, 2011: The 2011 Solutions in Perception Challenge is over. Finalized reports have been released to all of the participating teams. A simplified version of the performance report can be found here (note that the team names and affiliations have been redacted).
Short form results:
- First place: Berkeley (team 7) 68.78%
- Second place: Jacobs (team 3) 66.41%
- Third place: Stanford (team 6) 53.61%
Update 6 April, 2011: The code for tod_stub has been updated and development has been frozen. See the Starting Code section for more information.
Update 29 March, 2011: New data sets from Willow Garage and NIST are available in the Data section below.
Contact: add gmail.com to end of: solutionsinperception
- If you want to join, send to solutionsinperception with "Join" in the subject line.
This challenge aims, step by step, to remove the main bottleneck to agile robotics:
Reliable perception;
It seeks to develop and document solutions to pragmatic robotic sensing problems.
Problem Statement:
Most existing vision databases:
And the main existing vision challenges:
The PASCAL Visual Object Classes Challenge
Funded under the EU Pattern Analysis, Statistical Modeling and Computational Learning grant.
Are either
- implicitly aimed at image retrieval tasks
collections of web images from random, uncalibrated cameras in random scenes,
do not often involve emphasize finding the pose of an object.
- or else still do not directly address/document what problems vision/perception can actually solve
for example, finding 50% of the bicycles in flicker images does not allow one to confidently drive a robot car where there a cyclists.
The existing challenges push on the "false positive" part of the recognition curve:
Existing challenges do not take advantage of robotic sensing where may have:
- Standard, known, calibrated 2D and 3D sensors
- Known context/scenes
- Ability to alter the scene to our advantage
Feedback (a robot may move to get better views).
Proposed Answer
For robotic [and perhaps some cell phone] needs, the Solutions in Perception Challenge seeks to incrementally and progressively establish solutions to pragmatic sensing problems. This challenge addresses the "True Positive" part of the recognition curve:
Use of active depth sensing, 2D and 3D data is explicitly allowed, but not instrumentation of the objects.
This challenge is aimed at removing the perception barriers to advanced, sensing based, agile robotics.
The Competition
To Join:
Send an email with subject "Join" to: add ''gmail.com'' to end of: solutionsinperception
This challenge is intended to be run many times, progressively setting harder goals. The current challenge is described below
This challenge is sponsored by http://www.willowgarage.com and NIST.
We are soliciting to put together a program committee that will shape the future direction of this challenge. For now, send inquires with "Solutions in Perception" somewhere in the subject line to bradski (who is at) willowgarage postpended with . com or solutionsinperception (who has an account at) gmail (singular period) com
Current Competition: ICRA 2011
The first competition will be held at ICRA 2011 in the Robot Challenge section. The goal of this first contest is to show that moderately: Lambertian, rigid objects that have a lot of texture on them can be reliably recognized and their pose in 3D determined using 2D and/or 3D sensors at close range (roughly personal robot arm work space, 0.5 to 1.25 meters). If accomplished, this challenge will demonstrate that robots can reliably find and manipulate such objects under indoor florescent lighting conditions.
This challenge will make use of the kinect sensor which gives 2D RGB and 3D point clouds at 1M pixel resolutions:
A kinect sensor (desiged by Prime Sense) shown mounted on a PR2 Robot from Willow Garage.
With the goal of recognizing the identity and pose of 50 objects:
- Recognition and 6 degree of freedom (6dof) pose of an Odwalla juice bottle found in clutter.
Conditions of the first contest
The first contest will be held at ICRA 2011 in the Robot Challenge section.
The goal of this contest will be to recognize the identity and pose of 50 fairly rigid, fairly Lambertian, textured objects. One example would be something like a paper tea box. 35 of the objects will be known beforehand, 15 will be new for the contest.
The sensor will be a kinect sensor using drivers provided by prime sense and Willow Garage, see the ROS kinect page here.
The data (see the "Data" subsection below) will consist of extensive training and test sequences with ground truth of the 35 known objects. The data will consist of 2D images, 3D point clouds and labeled object identity and pose. Fifteen additional objects will introduced at the contest (making for a total of 35+15=50 objects). The 15 new objects will be produced by NIST but will be Lambertian, rigid and textured like the rest of the objects.
- The lighting will be indoor, diffuse florescent lighting, conditions where the kinect works well.
The code will retain the user's choice of license subject to the "Contestant's Code" subsection below.
Contestants will write their code to run with ROS. A code stub is provided so that they contestant does not need to learn ROS. The stub will just get a point cloud and image, display them and give an example of filling out the object identity, certainty and pose (the filled out pose is randomly chosen in the stub). Using the provided code stubs, your code will run will automatically run on the PR2 robot as well.
- The code will implement training, test and object persistence functions.
Scoring, see the "Contest Format" subsection below. There will be 2 elimination rounds followed by a real round.
- The starting rounds will have single objects at a time, then multiple objects where we look for object ID only.
- This will be followed by single objects where we stress the accuracy of precise pose estimation.
- Finally, there will be cluttered, fixtured scenes where the algorithm must recognize multiple objects and their pose.
The top performing code will be demonstrated on a PR2 Robot grasping the items.
- All the scores will be cumulative when deciding the prize money.
- Conference organizers reserve the right to make judgement calls at any point and on any matter in the contest. These judgements will be final.
Prizes will be awarded exponentially according to performance (rounding down), see the "Prizes" subsection below.
Code
Contestant's Code
This contest aims to establish and to advance what perception problems have workable solutions for robots. To do such advances, researchers must be able to train and test against previous code, ensuring fair and easily repeatable comparison and experimentation with past work. We also hope, however, to foster the growing use of robotics in society. To achieve this, contestants will:
- Use any license that they desire for their code. It may be closed, open, free or paid subject to:
- You will license the contest organizers, which include Willow Garage and NIST rights to the source code to build and run and maintain your code (which may include making minor alterations to your code in order to resolve bugs or issues with ROS) on servers of the contest organizers which specifically right now include Willow Garage and/or NIST's choice for the purposes of the contest.
- You will further license others to run your code on the contests servers which include right now Willow Garage and/or NIST servers. Researchers must be able to train and/or test old and new data sets against your code, but only on the servers specified by the contest organizers. This does not give other's the right to run your code on their system or products. It is just to have a means to compare newer algorithms directly against older algorithms. Again, the licensing of further uses of the contestant's code is up to the authors of that code.
- You will allow the contest organizers and external researchers to publish and track any of the results of such algorithm testing.
We are thinking about organizing a optional "perception apps" licensing store. As in existing apps stores, the apps may be posted for free or for a charge. The hope is that license terms will be such as to allow for mass use. This apps store is just an idea at this point and so it may or may not happen. Random thoughts about this are that it would look something like the mash-up visual imagery programming site Processing but would help put together solutions that could be toyed with while tracking the reasonable licensing fees for commercial use.
- The contestants will provide code that can be trained and tested for classification and object pose.
- The competition will run from code posted to our server that was pre-trained on the 35 objects and trained by the committee on the 15 new objects.
- The code has 15 seconds per scene to recognize objects.
Starting Code
The contest will be run on ROS the "Robot Operating System". This is to enable maintainability, easy testing and allow running on a robot (PR2) without having to know the details of that robot: the recognition and pose code will automatically send messages that work with the PR2 grasping and manipulation pipeline.
Instructions for downloading and using the code stub for getting 2D images and 3D point clouds for training and test can be downloaded from tod_stub (Textured Object Recognition stub). In brief, you will do the following:
- Inside the stub code:
In tod_stub/src/trainer.cpp, you can find the calibration board based pose estimator poseestimator. Based on this estimation, the raw point cloud sensor output will be translated to the calibration board frame.
Your implementation would start by modifying tod_stub/src/tod_stub_impl.cpp. You will fill out the MyTrainer::process method after the line "//do awesome training here.", and replace the code in MyDetector::detect (which currently just returns random values for R (rotation matrix), t (translation vector), ID and confidence).
MyTrainer::process will provide you with the transformed point cloud data and the raw 2D color image, and ojbect id, and expects you to train your object recognition algorithm with the data.
- The result of running the training process is up to the user.
- The training process should persist whatever data is needed to disk.
- Make sure to associate the given object id with any training data.
- Each frame should be considered as a single view of a particular object.
MyDetector::detect will provide you with the raw sensor-provided point cloud data and the raw 2D color image, and expects you to determine the object ID, pose (rotation and translation) and a confidence level, and then store this information in a result vector passed by reference to the detect method.
- After modifying the source code, rebuild the stub code by typing "rosmake" in the tod_stub project folder
- To run and test your training code:
rosrun tod_stub trainer -B <BAGFILE_NAME>.bag --fiducial <FIDUCIAL_NAME>.yml -C <USER_DEFINED_CONFIG_FILE>.yaml --image <IMAGE_TOPIC> --camera_info <CAMERA_INFO_TOPIC> --points <POINT_CLOUD2_TOPIC> --team_name <TEAM_NAME> --run_number <RUN_ID> --object_id <OBJECT_ID>
<BAGFILE_NAME>.bag is the file provided by us that contains the training data. Within the bag file, you will find the point cloud information, the 1 megapixel color image, and camera information. Because ROS is constantly evolving, the names of these topics is subject to change. As a result, you must specify the topic names:
<IMAGE_TOPIC> is the topic name of the Image message. The default value is image_color.
<CAMERA_INFO_TOPIC> is the topic name of the Camera_Info message. The default value is camera_info.
<POINT_CLOUD2_TOPIC> is the topic name of the Points2 cloud message. Default value is points.
<FIDUCIAL_NAME>.yml is a file describing the fiducial that is used to calculate the coordinate system of the object. This should be supplied with the training data.
<USER_DEFINED_CONFIG_FILE> is a file where you may describe parameters for your algorithm. The formatting and handling of this file will be up to you to define.
<TEAM_NAME> is a character string representing your team's name. This will be appended to the resulting output comma separated value (CSV) file generated by the trainer, and will be used to identify your team for scoring purposes.
<RUN_ID> is a numerical identifier for the current run number.
<OBJECT_ID> the object identifier, it is assumed here that the bag contains only one object. The object id should not contain spaces.
- The trainer is meant to be a once per item application.
- Running trainer will result in a disk based persistence of your training algorithm and a ground truth output CSV file (see below). You should persist your data in a way that is locally consistent (e.g one file or directory per training session). See the stub for an example.
- For using the pose that is included in the new training bags, run with the --pose topic defined:
ros run tod_stub trainer --pose "pose" --object_id object02 -B object02.bag
- To run and test your detector code:
rosrun tod_stub detector -B <BAGFILE_NAME>.bag -C <USER_DEFINED_CONFIG_FILE>.yaml --image <IMAGE_TOPIC> --camera_info <CAMERA_INFO_TOPIC> --points <POINT_CLOUD2_TOPIC> --team_name <TEAM_NAME> --run_number <RUN_ID>
<BAGFILE_NAME>.bag is the file provided by us that contains the training data. Within the bag file, you will find the point cloud information, the 1 megapixel color image, and camera information.
<IMAGE_TOPIC> is the topic name of the Image message. The default value is image_color.
<CAMERA_INFO_TOPIC> is the topic name of the Camera_Info message. The default value is camera_info.
<POINT_CLOUD2_TOPIC> is the topic name of the Points2 cloud message. Default value is points.
<USER_DEFINED_CONFIG_FILE> a configuration file that may be used by the user to set parameters. The formatting and handling of this file will be up to you to define.
<TEAM_NAME> is a character string representing your team's name. This will be appended to the resulting output CSV file generated by the detector, and will be used to identify your team for scoring purposes.
<RUN_ID> is a numerical identifier for the current run number.
- Running detector will produce a visualization of the detected objects and their poses, and will also automatically generate a CSV file of all objects detected in every frame.
- To test the output of your detector code:
- Both the trainer and detector code bases will automatically output CSV plain text files that can either be imported into a spreadsheet application like MS Excel or read manually in a text editor.
- The output file should be named using the following conventions:
File Name: RUN<run ID> <Team Name> <yyyy><mm><dd> <HH:MM: SS>.csv
<run ID>: A 4-digit integer unique to dd [0000 .. 9999]
<Team Name>: A string consisting of <= 10 characters (e.g. NIST-Tiger) that is passed in to your program as a command line option
<yyyy>: 4-digit year [2011]
<mm>: 2-digit month: [01 ... 12]
<dd>: 2-digit day: [01 ... 31]
<HH:MM:SS>: 2-digit consists of 24-hour, 60-min and 60-ss marking start of the run.
- An example file name would be as follows: RUN0002_NIST-Tiger_20101220_22:12:23.csv
- The format of the file will be as follows:
- The first line should consist of the header, "Ts,Run,Frame,dID,oID,R11,R12,R13,R21,R22,R23,R31,R32,R33,Tx,Ty,Tz"
- Each subsequent line will convey the object properties for each unique detection in every frame, and the values of each line in the file should thus be as follows:
- Ts: The time recorded by the system when this object was located. Format in hr.min.sec.msec". (e.g., 10.22.34.046)
- Run: The 4-digit test run number (e.g., 0003). A run is a collection of frames, and could be either a ROS bag or a live stream from the robot.
- Frame: The 3-digit frame number within the run. Starting from 0. (e.g., 000)
- dID: The 3-digit detection number (e.g., 005). The detection numbers are user-specific and are used to enumerate detections (Note: a detection could be false-positive).
oID: Uniquely dened string representing an object This string should be the same as the value of object_id in the TrainData data structure. (e.g., tilex, campbells_chicken_noodle, etc).
- R11,R12,R13,R21,R22,R23,R31,R32,R33,Tx,Ty,Tz: The Object pose consists of translation vector (Tx, Ty, Tz) and a 3x3 rotation matrix in the sensor coordinate. Note that the object may move between frames, but the sensor itself is static.
- Here is an example detection: 10.22.34.046,020,0003,005,tilex,R11,R12,R13,R21,R22,R23,R31,R32,R33,Tx,Ty,Tz
- Inside the stub code:
- We will provide plenty of data for training, testing and validation, but will also provide a list of objects and places to order calibration sets and fixtures for your own development if desired.
- Prior to the contest, contestants will check in their code to be trained and run on our server.
These same instructions can be found in your local tod_stub repository in tod_stub/tod_stub.doc.txt.
Data
The data will consist of many ROS bag files which can collect all the data and messages sent out in a real time session.
The bag files may be played back using rosbag on a command line such as: rosbag play recorded1.bag recorded2.bag ....
Rosbag has many command line options such as play, record, info, compress, and decompress.
You may examine the contents of a bag file using rxbag.
Data may be downloaded from: this site.
- There are 35 models with train data. Your code must be able to train from bag files since an additional 15 like items are being withheld prior to the contest.
- The objects from the above link are new as of 3/9/2011, and supersede the preliminary data released in February.
- Test data is also available at the same link above. Be aware that, although the test data contains a checkerboard in the images, you should not expect such calibration features to appear in the actual competition.
NEW (03/29/2011) Representative training and test sets from NIST have been released, complete with ground truth. These can be found here.
- In the above directory, you will find:
- 2 movie files:
obj16.bag.ogv (training data)
obj16test.bag0.ogv (test data)
- training bag files (one with .tf.bag extension):
obj16.bag.zip
obj16.tf.bag.zip (only needed if you want to use visualize with rviz)
- 1 test bag file:
obj16test.bag0.zip
- 1 fiducial yml file:
fiducial_NIST-training.yml
- 2 ground-truth CSV files, one for training data and one for test data:
RUN0000_NIST-training_20110318_17.48.31.csv
RUN0000_NIST_20110322_08.00.47.csv
- 2 movie files:
- Training: You will need the fiducial yml file and the training bag file to run the tod_stub "trainer" program. Run it as below:
cd ~/tod_stub_dev/tod_stub/bin/
./trainer -B obj16.bag -F fiducial_NIST-training.yml -I obj16 --image image --camera_info camera_info --points points2
- Detection: You will need the test bag file for the tod_stub's "detector" program. Run it as below:
cd ~/tod_stub_dev/tod_stub/bin/
./detector -B obj16test.bag0 --image /camera/rgb/image_color --camera_info /camera/rgb/camera_info --points /camera/rgb/points
- Ground Truth: There are two variants of ground-truth data: One uses a machined ground-truth apparatus, and the other uses the fiducial checker-board.
Machined apparatus: Each training frame has a unique object pose. This is recorded in the CSV file: RUN0000_NIST-training_20110318_17.48.31.csv
Note: the "Frame" field in the CSV file matches the "Seq" field of the ROS PointCloud2 data header in the bag file.
Similarly, RUN0000_NIST_20110322_08.00.47.csv contains the true object pose for each object in the test bag file.
- Fiducial checker-board: Each training frame has a unique object pose. This is determined by running the tod_stub's "trainer" program, which should also output a ground-truth CSV file.
Note: the "Frame" field in the CSV file matches the "Seq" field of the ROS PointCloud2 data header in the bag file.
- To determine the ground-truth for the test data, you could use the training ground-truth from 1.) First establish the mapping between the two variants of the training ground-truths above. And then, apply this mapping to the test ground-truth in 1.) to obtain another set of test ground-truth. This test ground-truth should have the same object-frame convention as determined by the fiducial checker-board presented in the training data.
- In the above directory, you will find:
- Data conditions
- The data will consist of items on a table that may appear alone or in clutter.
- The lighting will be indoor, florescent.
- The kinect camera will be set at 1M pixel resolution with object between 0.6 and 1.2 meters away.
- In cluttered scenes, there may be duplicate objects.
The poses of the items will be variable in 6DOF (so you should anticipate 3D translations and rotations)
- The 35 objects in order, left-to-right, top-to-bottom are shown below:
- 35 known objects corresponding to the above picture in order, left-to-right, top-to-bottom:
- The 35 known objects on their ground truth keys in order right to left, bottom to top.
35 Objects |
|||
Number |
Object_ |
Size US |
Size Metric |
1 |
Odwalla Orange Juice |
15.2 fl. oz |
450 mL |
2 |
Odwalla Summertime Lime |
15.2 fl. oz |
450 mL |
3 |
Spam Original |
12oz |
340g |
4 |
Learning OpenCV |
-- |
-- |
5 |
Mop & Glow |
1qt |
946mL |
6 |
Silk Original soy milk |
1qt |
946mL |
7 |
Claritin-D 24 hour |
10 tablets |
10 tablets |
8 |
Hershey's Coca Special Dark |
8oz |
226g |
9 |
Tazo Organic Chai tea |
1.9oz (20 bags) |
54g (20 bags) |
10 |
Good Earth Original |
1.43oz (18 bags) |
41g (18bags) |
11 |
Snedd's Spread Country Crock (margarine) |
15oz |
425g |
12 |
Kellogg's Raisin Brand |
25.5 oz |
723g |
13 |
Tilex Mold & Mildew Remover |
1qt |
946mL |
14 |
Arm & Hammer Detergent |
50 fl oz |
1.47L |
15 |
Tide detergent |
50 fl oz |
1.47L |
16 |
Downy detergent |
34 fl oz |
1.02L |
17 |
All detergent |
50 fl oz |
1.47L |
18 |
Nestle Coffee Mate French Vanilla |
16 fl oz |
473mL |
19 |
Gillette complete skin care shaving cream |
7oz |
198g |
20 |
All 3x ultra detergent |
20 fl oz |
0.59L |
21 |
General Mills Oatmeal Crisp |
17oz |
481g |
22 |
Del Monte Peas & Carrots |
14.5 oz |
411g |
23 |
Campbell's soup at hand creamy tomato |
10.75oz |
305g |
24 |
Clorox regular bleach |
60 fl oz |
1.77L |
25 |
Coke |
12 fl oz |
355mL |
26 |
Ziploc plastic bags 6 1/2 x 5 7/8in (16.5 x 14.9cm) |
50 bags |
50bags |
27 |
Campbell's condensed tomato soup |
10.75oz |
305g |
28 |
Progresso Traditional New England Clam Chowder |
18.5 fl oz |
524g |
29 |
Campell's just heat & engjoy tomato soup |
15.4oz |
435g |
30 |
Crest tooth paste tarter control plus scope |
4.6oz |
130g |
31 |
Colgate toothpaste Icy Blast |
4.6oz |
130g |
32 |
Bumble BEe Chunk White albacore tuna in water |
5oz |
142g |
33 |
Band-Aid plastic strips Johnson&Johnson |
3/4 x 3 in (60 band-aides) |
1.9 x 7.6cm (60 band-aides) |
34 |
Jell-o Strawberry |
6oz |
170g |
35 |
Tropicana 100% pure Orange Juice with calcium (6 pack) |
8 fl oz |
240mL |
- All the known objects on their keys in order, left to right, back to front.
- A sample from the NIST collection of objects that are representative of textured manufacturing parts.
Keys
- The object coordinate system
- The object coordinate system as defined by the key.
Contest Format
- Contestants will upload their classifier code which does training, testing and persistence 7 days prior to the conference (not workshop) start. They will also provide their trained model on the 35 known objects, remembering that 15 new items will need to be learned.
- The classifiers will then be trained by the conference committee on 15 new objects. These new objects will be part of the competition which will then include 35 previous objects and 15 new objects for a total of 50 objects.
- Elimination: We will eliminate the lowest performing competitors such that we'll have only 3 left on the last round.
- The challenge will run in several rounds, the code will run on our server:
- Identify individual objects in many poses.
- Identify multiple objects (cluttered scene) in many poses.
- Accurately identify the ID and poses of individual objects.
- Identify the ID and pose of many objects in many poses.
- Periodically, leading contestant's code will be run on a PR2 robot that will use the code to ID objects and their poses so that the robot can pick them up. This part will not be judged.
- Contestants will not have to write any robot specific code and will specifically not have to write grasping code. By following the provided code stubs I/O formats, the code will automatically run on the robots' grasping pipeline.
- Periodically, leading contestant's code will be run on a PR2 robot that will use the code to ID objects and their poses so that the robot can pick them up. This part will not be judged.
- Some travel support will be available to selected contestants.
Scoring:
- The classifier code will have 15 seconds to recognize each scene.
- +1.0 for identifying each object in a scene, -0.5 for miss detections, -1.0 for false detections. Correct pose is +1.0 and falls off with error until it is set to zero at some error threshold.
- Minimum score per scene is 0.
Hardware
Contestant's code will ultimately need to run on a PR2 robot. You can read the PR2 Spec Sheet here.
The PR2 has 2 quad core i7 Xeon Processors with 8 cores each for a total of 16 cores. The summary specs are:
2x Onboard Servers Processors :: Two Quad-Core i7 Xeon Processors (8 cores) Memory :: 24 GB Externally Removable Hard Drive :: 1.5 TB Internal Hard Drive :: 500 GB
These are the detailed PR2 CPU specs.
Some robots have GPUs, but do not assume that we will have them. You can multi-thread your code as you see fit.
Prizes
- There will be limited travel support made available.
- Prizes will be awarded exponentially according to the payoff schedule below. Any winner below 80% recognition will not be paid.
- In case of ties in the scores, place awards will be based on computational requirements (lower=better).
- If this cannot be easily and fairly determined, then the prize will be split evenly.
- Prize awards if any are subject to the decisions of the challenge committee.
Exponential payoff (US dollars) for the winning algorithms in the challenge. The scores are percent correct. Scores below 80 percent generate no payoff. The above is what we intend to do, However: The actual payoff or lack of payoff regardless of the score, performance, fairness etc is entirely up to the conference committee.
Schedule
Schedule |
|
Date_ |
Event |
Feb 6 |
Stub code and preliminary data are out |
Feb 18 |
Official training data is out |
April 15 |
Upload you code onto the server. Your code should compile and output the required data structures. |
May 1 |
All code is uploaded |
May 9 |
Conference starts |
May 16 |
Final Results are here |
ENTERING THE CONTEST
Send an email with subject "Join" to: add ''gmail.com'' to end of: solutionsinperception
Older Material
Older text is stored here.