A Better Perspective
By 2050 it is predicted that the human population will grow to more than 9 billion people, which puts an enormous amount of strain on food production. Hence sustainable crop manufacturing has become an increasingly pertinent part of bioscience. Plant phenotyping, which is the analysis of different plant characteristics, directly impacts the development of crop management technologies and has become a popular topic in computer vision. Many crop management systems rely heavily on the latest developments in machine learning techniques to keep up with ever increasing demand. However, often the bottleneck with these systems is not the underlying logic powering these models, but rather the training and input images; many of these models require a credible visual representation of a plant in order to achieve valid classification/localisation. Currently, systems for capturing effective plant data are limited since they cannot discover important perspectives needed to gain a legitimate interpretation of the plant's 3D structure. A lot of plant phenotyping models suffer performance penalties due to having subpar plant training and input images.

Static image capturing techniques require little supervision and can capture continuous images of plants over a specified timeframe. However, they often fail to capture the required views for effective analysis of the plant; once the camera positions are chosen, they will not change. Meanwhile, dynamic image capturing involves a human participant actively capturing the required perspectives needed for effective analysis of the plant and capturing of coherent training data. The downside is that this requires a participant, often a professional in biological sciences, to manually capture the required images. Ultimately, the aim of this project is to combine the benefits of these two techniques. There is clearly a market for a system which can learn camera perspectives of a plant with the highest intrinsic value, which will produce the highest accuracy result when used as input to other specific models. At the time of writing, there are currently few solutions for automatic, dynamic plant image capturing.
Static Dynamic
Supervision Little Excessive
Cost Cheap/
Fast
Expensive/
Slow
Image Value Poor Good
Benefits and drawbacks of static and dynamic image capturing methods for plant phenotyping models
Project Proposal

This project proposes a new approach to automatic image capturing by using a UR5 robotic arm attached with RGD-D (IR) camera that can traverse to any potential perspective around an object, which in this case will be a plant. A reinforcement learning agent will be trained to control the robotic arm to predict trajectories around the plant to optimal perspectives that will be used for image capturing. The model will be general enough that a unique, unseen plant can be analysed by the agent effectively to generate prime perspectives for image capturing.


Hence, by placing a plant in the centre of the robot's view space, the agent can iteratively analyse its current camera view and then traverse to an optimal perspective to capture the best image of that unique plant. Each camera view with have: RGB, depth and infrared data, which will be examined by the agent to calculate the next action around the plant. The benefit of using this system is that it is much faster than manually finding and capturing the best camera image. Also, the agent requires very limited human supervision while being able to produce images with high intrinsic value. Overall, this system combines the benefits of both static and dynamic image capturing, whilst suffering none of the drawbacks.

Camera attached to ur5

Image of a RGB camera attached to the UR5 robotic arm, rotated to focus on a plant subject.

View Synthesis

There are several issues with training a reinforcement learning agent on real, static plants. Firstly, most plant species have the annoying issue of decaying over time unless kept in specific conditions. Hence, real-time training becomes problematic, since the plant will only last a few days before becoming unsuitable for training. Furthermore, the robotic arm needs constant supervision when traversing around the plant, in order to avoid accidental collisions or potential risks with no operator present. Therefore, an offline learning approach needs to be taken for training the agent where the agent can move virtually around the plant and an image of that plant will be generated from that perspective. This can be solved using view synthesis.


View synthesis involves generating a continuous 3D environment from a set of 2D images. Given a position and view direction, the view synthesis model can generate a prediction of what an image of that environment, from that given perspective, will look like. Hence, training involves overfitting a model on a set of 2D images of an environment so that the model can predict what new perspectives will look like. The benefit of this for this project is that, given a set of 360 degree view images of a plant, an efficient view-synthesis model can generate what that plant will look like from any perspective; this allows for offline training.

Neural Radiance Field Infographic Neural Radiance Field Infographic
Neural Radiance Field Lego Toy

Figure showing how view synthesis can be generated from a set of 2D images


Neural Radiance fields (2020) are a relatively new discovery of how to perform optimal view synthesis. Essentially, a multi-layered perceptron is used to predict the colour and volume density of every point in the continuous 3D environment. Then, by using volume rendering, a 2D image can be reconstructed by shooting rays into the environment and calculating the final colour. I recommend reading the original paper about NeRFs here. It is clear that NeRFs are the best approach for generating 3D environments of each of the plants.

View Synthesis Training Process

Simplistic overview of the process of generating a 3D representation of plant objects from 2D images using a view synthesis model (such as neural radiance fields).


In order to generate a series of different plant NeRFs, there needs to be a dataset which contains a large number of 2D image views for each plant. This will be captured using a UR5 robotic arm. Click here to find out how we are going to capture this data.

Robot Control

Robotic Control Description


Once an agent has been trained offline on the plants, it can be transferred to control the UR5 robot in real time. The UR5 arm will be suspended above the plant on the ceiling, allowing the camera to capture all views around the plant with a radius 85cm from the base of the robot. An RGBD (IR) camera will be attached to the end of the arm and will provide the current view from the arm for the agent to analyse; this is how it can determine the next position to move to. The entire lab will be modelled inside a Gazebo environment and the planning path logic will be performed using MoveIt.

Project Objectives
  1. To train a reinforcement learning agent to predict trajectories to optimal camera perspectives (based on important biological characteristics) around a 3D plant representation.
  2. To develop a bespoke, effective neural radiance field model that can generate high quality 3D representations of a variety of plants that can be used for continuous training of an agent.
  3. To optimise the model so that training times are limited and only a few views are required in order to choose optimal perspectives, ensuring that the model can be used for real-world applications
  4. To generate a high quality, dataset of 360 degree views of a series of different plants that can be used for generating effective 3D representations.
The Big Picture

While this project may seem very niche and theoretical, if the project is successful, it has the potential to offer several benefits to both the agriculture and computer science industry:

  1. The trained agent can be paired with other plant phenotyping models to improve the performance of that model; if better evaluation data is acquired by the agent capturing the images, then a more accurate prediction can be acquired.
  2. Techniques for training the image capturing agent could be extended in the future to incorporate other robotic movement, such as drones, in order to capture image data in a crop field.
  3. The dataset of plant images can be used for training/testing other newly developed NeRF techniques.
  4. The agent can be trained to capture different types of objects other than just plants

Overall, this project is only in its infancy, but I hope with time this is able to contribute to the plant phenotyping community. Thank you for reading.

Big Picture Meme