From Manual Entry to ML Magic: Building a React Native Gasoline Price Reader

January 15, 2024 (1y ago)

The daily grind of manual data entry is a familiar pain point in many industries. Imagine field operators, day in and day out, meticulously keying in gasoline prices from station displays – a task ripe for human error and consuming valuable time. This was the challenge that sparked an idea: could I leverage the power of React Native and on-device Machine Learning to build an app that could see and understand these prices automatically? This is the story of GasPriceReader, a journey from a manual chore to an intelligent, automated solution.

The Spark: Identifying a Real-World Pain Point

The core problem was clear:

My goal was to create an app that would empower field operators to simply:

  1. Snap a photo of a gas station's price display.
  2. Let Machine Learning accurately extract the relevant price numbers.
  3. Handle diverse real-world conditions like varying lighting, display types, and angles.
  4. Provide instant feedback to the user, confirming the captured price.

Crafting the Digital Toolkit: Our Technology Stack

To bring GasPriceReader to life, I selected a combination of powerful and flexible technologies:

// A conceptual overview of our tech stack
const gasPriceReaderTech = {
  ui: "React Native (JavaScript/TypeScript)",
  cameraInterface: "react-native-vision-camera",
  mlInferenceEngine: "TensorFlow Lite (via ML Kit)",
  opticalCharacterRecognition: "ML Kit Text Recognition On-Device API",
  appState: "Zustand",
  nativeIntegration: "React Native Native Modules (for ML processing bridge)",
};

The Development Odyssey: Navigating Key Hurdles

Building an ML-powered app, especially one interacting with the real world, comes with its unique set of challenges.

1. Mastering Camera Integration for Quality Input

Garbage in, garbage out – this adage is especially true for ML. The quality of the input image is paramount. I opted for react-native-vision-camera due to its performance and modern features.

The Challenge: Ensuring consistent, high-quality image captures. This involved:

// Simplified Camera Component for Price Capture
import React, { useRef, useState, useEffect } from 'react';
import { StyleSheet, View, TouchableOpacity, Text } from 'react-native';
import { Camera, useCameraDevices } from 'react-native-vision-camera';
// Assume processImageWithML is defined elsewhere, likely calling the native module
// import { processImageWithML } from './mlProcessor'; 
 
function PriceCaptureScreen({ onPriceRecognized }) {
  const camera = useRef(null);
  const devices = useCameraDevices();
  const device = devices.back; // Prioritize back camera
 
  const [hasPermission, setHasPermission] = useState(false);
  const [isCapturing, setIsCapturing] = useState(false);
 
  useEffect(() => {
    (async () => {
      const status = await Camera.requestCameraPermission();
      setHasPermission(status === 'authorized');
    })();
  }, []);
 
  const capturePhotoAndProcess = async () => {
    if (!camera.current || !device || isCapturing) return;
 
    setIsCapturing(true);
    try {
      const photo = await camera.current.takePhoto({
        qualityPrioritization: 'quality', // Prioritize quality over speed for OCR
        flash: 'off', // Usually off for reflective price signs
        enableShutterSound: true,
      });
      console.log('Photo captured:', photo.path);
      // const recognizedPrice = await processImageWithML(photo.path);
      // onPriceRecognized(recognizedPrice);
      // For this blog post, we'll simulate the ML part
      setTimeout(() => { 
        onPriceRecognized(parseFloat((Math.random() * (4.99 - 2.99) + 2.99).toFixed(2))); 
        setIsCapturing(false);
      }, 1500);
 
 
    } catch (error) {
      console.error('Failed to capture or process photo:', error);
      // Handle error: show message to user
      setIsCapturing(false);
    }
  };
 
  if (!device) return <View><Text>No camera device found.</Text></View>;
  if (!hasPermission) return <View><Text>Camera permission denied.</Text></View>;
 
  return (
    <View style={styles.container}>
      <Camera
        ref={camera}
        style={StyleSheet.absoluteFill}
        device={device}
        isActive={true}
        photo={true} // Enable photo capture
      />
      <TouchableOpacity 
        style={styles.captureButton} 
        onPress={capturePhotoAndProcess}
        disabled={isCapturing}
      >
        <Text style={styles.buttonText}>{isCapturing ? 'Processing...' : 'Capture Price'}</Text>
      </TouchableOpacity>
    </View>
  );
}
 
// Basic styling (add your own)
const styles = StyleSheet.create({
  container: { flex: 1 },
  captureButton: { position: 'absolute', bottom: 50, alignSelf: 'center', backgroundColor: 'blue', padding: 15, borderRadius: 50 },
  buttonText: { color: 'white', fontSize: 16 },
});
 
export default PriceCaptureScreen;

Key Learnings:

2. Bridging React Native and On-Device ML

Running ML inference directly on the device is key for speed, offline capability, and privacy. ML Kit's Text Recognition API simplifies this, but communication between JavaScript and the native ML processing still needs to be established.

The Challenge: Efficiently sending the image path to native code, invoking ML Kit, and returning the structured results (recognized text and bounding boxes) to JavaScript.

This typically involves creating a React Native Native Module.

// Simplified Native Module (Android - Java) for ML Processing (Conceptual)
// Actual implementation would use ML Kit's TextRecognition client
 
// In your Android project: MLProcessorModule.java
// package com.yourproject;
// import com.facebook.react.bridge.*;
// import com.google.mlkit.vision.text.TextRecognition;
// import com.google.mlkit.vision.text.TextRecognizer;
// import com.google.mlkit.vision.common.InputImage;
// import android.net.Uri;
// import java.io.IOException;
 
// public class MLProcessorModule extends ReactContextBaseJavaModule {
//   MLProcessorModule(ReactApplicationContext context) {
//     super(context);
//   }
 
//   @Override
//   public String getName() {
//     return "MLProcessor";
//   }
 
//   @ReactMethod
//   public void processImage(String imagePath, Promise promise) {
//     try {
//       InputImage image = InputImage.fromFilePath(getReactApplicationContext(), Uri.parse(imagePath));
//       TextRecognizer recognizer = TextRecognition.getClient(com.google.mlkit.vision.text.latin.TextRecognizerOptions.DEFAULT_OPTIONS);
      
//       recognizer.process(image)
//         .addOnSuccessListener(visionText -> {
//           // Extract relevant price information from visionText.getText()
//           // This is where the logic to find numbers like "3.99" would go.
//           // For example, regex or heuristics.
//           String extractedPrice = parsePriceFromMLOutput(visionText.getText());
//           promise.resolve(extractedPrice); // Should be an object like { price: "3.99" }
//         })
//         .addOnFailureListener(e -> promise.reject("ML_PROCESSING_ERROR", "Failed to process image with ML Kit", e));
//     } catch (IOException e) {
//       promise.reject("IMAGE_LOAD_ERROR", "Failed to load image for ML Kit", e);
//     }
//   }
 
//   private String parsePriceFromMLOutput(String text) {
//     // Implement logic to find patterns like X.XX or X XX
//     // This is a placeholder for robust parsing logic
//     java.util.regex.Pattern pattern = java.util.regex.Pattern.compile("\\b(\\d{1,2}[.,]\\d{2})\\b");
//     java.util.regex.Matcher matcher = pattern.matcher(text);
//     if (matcher.find()) {
//       return matcher.group(1).replace(",", "."); // Normalize to use dot as decimal
//     }
//     return null; // Or some default/error indicator
//   }
// }

The JavaScript counterpart would then call NativeModules.MLProcessor.processImage(imagePath).

Key Learnings:

3. Tackling Real-World Variability

Lab conditions are one thing; the wild is another. Price signs come in all shapes, sizes, fonts, and states of repair. Lighting conditions vary dramatically.

Challenges & Solutions:

The Payoff: Measurable Success and Valuable Insights

After several iterations, robust testing, and refinement based on real-world usage, GasPriceReader started delivering tangible results:

Lessons Forged in Code: My Key Takeaways

This project was a fantastic learning experience, reinforcing several key principles:

  1. Choose the Right Tools for the Job: react-native-vision-camera for robust camera control and ML Kit for accessible on-device OCR were game-changers. Don't reinvent the wheel if mature solutions exist.
  2. Native for Performance-Critical Tasks: For image processing and ML inference, leveraging native code through modules is often essential for optimal performance on mobile devices.
  3. Robust Error Handling is Non-Negotiable: Especially with ML, where inputs can be unpredictable, anticipate failures and provide clear, actionable feedback to the user. Allow for manual overrides.
  4. Iterate Based on Real-World Testing: What works in the lab might falter in diverse real-world scenarios. Continuously test with varied inputs (different lighting, sign types, angles) and refine your models/logic.
  5. User Experience is Paramount: Even the most sophisticated ML is useless if the app is clunky. Provide clear instructions, loading indicators, and easy ways to correct errors.
  6. The "Last Mile" of ML is Parsing: Getting raw text from OCR is only half the battle. The logic to interpret that text and extract the specific, structured data you need is often complex and requires careful thought.

The Road Ahead

While GasPriceReader is already making an impact, there's always room for improvement:

Final Thoughts: The Power of Hybrid Innovation

The GasPriceReader project beautifully illustrates how React Native, combined with the growing capabilities of on-device Machine Learning, can create powerful, practical solutions to real-world problems. By bridging the ease of JavaScript development with the performance of native ML, we can build truly intelligent mobile applications that save time, reduce errors, and enhance user productivity.

Stay tuned for more explorations at the intersection of React Native and Machine Learning!


Have you worked on similar ML-driven data capture projects? Share your experiences or questions in the comments below!