Reading: AN INTUITIVE EXPLANATION F CONVOLUTIONAL NEURAL NETWORKS
I found Ujjwal Karn's article to be a very accessible entry point into the world of convolutional neural networks. What really struck a chord with me was how he broke down the convolution, ReLU, pooling, and fully connected layers into intuitive operations rather than just math. The way I thought about image data shifted, going from seeing it as a block of pixels or an image to thinking of it as a stack of matrices with filters sliding over them; it really helped me internalize what “feature extraction” actually means. Also, the way he explained the building up of CNNs-from detecting simple edges to complex shapes-really mirrored the way I imagined the human visual system worked. There was clarity in his writing that made me feel like I could explain these layers to someone without a deep math background. More than anything, this reading increased my confidence in being able to tackle CNNs and not be intimidated by such depth in the architecture of deep learning.
Training Image Classifiers
For this project, I built an interactive webcam image classifier in p5.js using ml5.js as the machine learning library. I was inspired by one of my professor’s reference examples where a webcam feed was used to train a model with two labels, “A” and “B.” I wanted to take that concept further by redesigning the interface to be more polished and intuitive. I added labeled input fields so users could rename the categories (for example, “Smile” vs “Neutral”), progress bars to show the training status, and live confidence updates. My goal was to make the interaction feel less like a raw experiment and more like a finished, user-friendly application.
The program works by first capturing frames from the webcam using the p5.js createCapture() function. When I click “Add Example A” or “Add Example B,” the current frame is resized to 64×64 pixels and its pixel data is normalized between 0 and 1. These inputs, along with their corresponding labels, are then added to the ml5 neural network using the addData() method. Once enough samples are collected, I can train the model by clicking “Train Model,” which calls classifier.train(). The model learns from the pixel data through multiple epochs, and once training is complete, the program continuously calls classifier.classify() to predict new webcam frames in real time. The classification results are displayed on screen with a confidence score that updates every frame.
One challenge I faced was managing performance in the p5.js Web Editor—sometimes the webcam feed would lag when too many samples were added, so I had to reduce the input size from 128×128 to 64×64 for smoother performance. I also struggled a bit with getting the mirrored webcam effect to display correctly since p5.js automatically flips coordinates when drawing an image. Debugging the asynchronous nature of training and classification in ml5.js also required trial and error, especially when chaining callbacks for real-time updates. Overall, I felt that going through this process helped me understand how image classification works at a more practical level. I liked seeing how quickly the neural network could adapt to different visual patterns, and how accessible ml5.js makes machine learning for creative applications.
Picture:

P5.js Link: https://editor.p5js.org/Anna_Tang/sketches/tCoMJokF6