Home Blog Projects Newsletter
← Back to Projects
Computer Vision · Vision TransformersJan 2022 – May 2022 · Final-Year Undergraduate Project

🌿 Weed Detection & Classification using Deep Learning

Comparing Vision Transformers against leading CNN architectures to accurately classify grass and broadleaf weeds — enabling targeted spraying that cuts herbicide waste and protects the environment.

S

SituationThe context

Blanket chemical spraying across large fields wastes herbicides and labour, pollutes the environment, and degrades food quality. Accurately identifying weeds so they can be sprayed selectively is critical for sustainable agriculture — but CNNs demand heavy compute, are prone to overfitting, and need large, balanced labelled datasets.

T

TaskThe objective

Investigate whether Vision Transformers — typically used in NLP — can match or beat established CNNs at classifying grass versus broadleaf weeds, and deliver a modular pipeline for training and testing weed-classification models.

A

ActionWhat I built

  • Trained and benchmarked models on a dataset of 17,509 labelled images of weeds native to Australia.
  • Built preprocessing to handle grass and broadleaf weed images and a modular pipeline for training, validation and testing.
  • Implemented and compared four state-of-the-art CNN architectures — ResNet-50, Xception, Inception V3 and Inception-ResNet V2.
  • Introduced a Vision Transformer (ViT) as a candidate replacement for CNNs and evaluated it head-to-head on the same data.
  • Analysed and visualised model predictions in a Jupyter Notebook to compare accuracy across all architectures.
R

ResultThe outcome

  • The Vision Transformer achieved the highest accuracy at 96.41%, outperforming every CNN baseline.
  • ResNet-50 followed at 95.70%, then Xception at 95.04%, Inception V3 at 94.7%, and Inception-ResNet V2 at 94.15%.
  • Showed that transformers are a viable — and superior — alternative to CNNs for weed classification, supporting more precise, sustainable spraying.
96.41%
Vision Transformer accuracy (best)
17,509
Labelled training images
5
Architectures benchmarked

Tech Stack

PythonVision Transformer (ViT)ResNet-50XceptionInception V3Inception-ResNet V2TensorFlow / KerasComputer VisionJupyter Notebook