Computer Vision (CSA401) / Computer Vision Lab (CAL401)

 Theory Syllabus

Computer Vision

Computer vision is a field of artificial intelligence that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to automate tasks that the human visual system can do.

Computer vision tasks include:

  • Object detection: Identifying and locating objects in an image or video.
  • Object classification: Determining the category of an object.
  • Scene understanding: Understanding the context of an image or video, such as the location of objects and their relationships to each other.
  • Motion tracking: Tracking the movement of objects over time.
  • Face recognition: Identifying and verifying people from their facial features.

Computer vision is used in a wide variety of applications, including:

  • Self-driving cars: Computer vision is used to help self-driving cars see the road, other cars, pedestrians, and other objects.
  • Virtual reality and augmented reality: Computer vision is used to create virtual and augmented reality experiences.
  • Security: Computer vision is used to detect and track people and objects in security applications.
  • Medical imaging: Computer vision is used to analyze medical images, such as X-rays and MRI scans.
  • Retail: Computer vision is used to track inventory and detect shoplifters.

Computer vision is a rapidly growing field, and new applications are being developed all the time. As computer vision technology continues to improve, we can expect to see even more innovative and exciting applications in the future.

How Computers Read Images

A computer reads an image by first converting it into a digital format. This is done by dividing the image into a grid of tiny squares called pixels. Each pixel is assigned a value that represents its color. In a grayscale image, the value of a pixel can range from 0 (black) to 255 (white). In a color image, the value of a pixel can range from 0 (black) to 255 for each of the three primary colours: red, green, and blue.

Once the image has been converted into a digital format, the computer can then use various algorithms to analyse the image. These algorithms can be used to identify objects in the image, to classify the image, or to understand the context of the image.

For example, an object detection algorithm might look for specific patterns of pixels that correspond to known objects. A classification algorithm might look for the distribution of pixel values in an image and assign the image to a particular category. A context-understanding algorithm might look at the relationships between objects in an image and try to understand what the image is depicting.


How Computers Store Images

Images are stored in a computer as a matrix of numbers known as pixel values. These pixel values represent the intensity of each pixel. In grayscale images, a pixel value of 0 represents black, and 255 represents white. In color images, the pixel values represent the amount of red, green, and blue in each pixel.

The number of pixels in an image is determined by its resolution. The higher the resolution, the more pixels the image has, and the more detailed the image will be.

Images are typically stored in a file format that is specific to the type of image. For example, JPEG is a common file format for storing digital photographs. PNG is another common file format for storing digital images.

 

Low-Level Computer Vision:

Low-level computer vision focuses on basic image processing and understanding the raw pixel data of an image. It involves fundamental tasks like edge detection, image filtering, and image enhancement. These techniques are often used as the initial steps in more complex computer vision tasks. Some common low-level computer vision techniques are:

a. Edge Detection: This technique aims to identify the boundaries between different objects or regions in an image. Popular edge detection algorithms include the Sobel operator and the Canny edge detector.

 

b. Image Filtering: Image filters are used to remove noise from images, enhance certain features, or smooth out an image. Examples of filters are Gaussian blur, median filter, and sharpening filter.

 

c. Image Thresholding: Thresholding is used to segment an image into different regions based on pixel intensity. For instance, converting a grayscale image into a binary image (black and white) using a specific threshold value.

 

Example: Suppose you have an image with different objects in it. Low-level computer vision techniques can help identify the edges of these objects, filter out any noise present, and enhance the contrast of the image.

 

Mid-Level Computer Vision:

Mid-level computer vision involves more advanced techniques that aim to extract meaningful information from the input image. This level of computer vision focuses on tasks like object recognition, image segmentation, and optical flow. Mid-level techniques often require more sophisticated algorithms and may involve some knowledge of the context of the image. Some examples of mid-level computer vision tasks are:

a. Object Recognition: This task involves detecting and identifying specific objects or patterns within an image. It is used in applications like face recognition, object detection in autonomous vehicles, etc.

 

b. Image Segmentation: Image segmentation is the process of dividing an image into meaningful regions or segments. This helps in isolating specific objects or regions within the image.

 

c. Optical Flow: Optical flow computes the motion vectors of pixels between consecutive frames in a video, allowing the tracking of moving objects.

 

Example: In a self-driving car, mid-level computer vision techniques are used to recognize pedestrians, vehicles, and traffic signs, segment the road from the surroundings, and track the motion of nearby objects to avoid collisions.

 

High-Level Computer Vision:

High-level computer vision involves the highest level of understanding and interpretation of images. This level often involves the use of machine learning and deep learning techniques to recognize complex patterns and make decisions based on the visual input. High-level computer vision tasks include:

a. Object Detection and Recognition: High-level computer vision systems can not only identify objects but also recognize specific instances of those objects.

 

b. Image Captioning: Generating a natural language description of an image, explaining what is happening in the scene.

 

c. Image-to-Image Translation: Converting images from one domain to another, like turning sketches into realistic images or day-to-night image translation.

 

Example: High-level computer vision is employed in autonomous vehicles to recognize different types of vehicles, pedestrians, cyclists, traffic lights, and road signs. It processes a vast amount of visual data to make informed decisions while driving.

 

In summary, low-level computer vision deals with basic image processing tasks, mid-level computer vision focuses on extracting meaningful information, and high-level computer vision involves complex pattern recognition and decision-making using machine learning and deep learning techniques. The combination of these levels plays a crucial role in building advanced computer vision applications that can interpret and interact with the visual world.

 

 v  Overview of Diverse Computer Vision Applications: Document Image Analysis, Biometrics, Object Recognition, Tracking, Medical Image Analysis

 

 

Computer vision is a rapidly growing field with diverse applications that leverage artificial intelligence and machine learning techniques to interpret and understand visual information from the world. Here's an overview of some key computer vision applications in detail:

  1. Document Image Analysis: Document Image Analysis involves processing and understanding documents, such as scanned papers, forms, and handwriting. Optical Character Recognition (OCR) is a significant part of this application, where computer vision algorithms extract text from images and convert it into editable, searchable, and machine-readable formats. This technology is widely used in digitizing archives, automating data entry, and improving accessibility for visually impaired individuals.
  2. Biometrics: Biometrics uses computer vision to analyse and recognize unique physical or behavioural characteristics of individuals for identity verification. Facial recognition is a well-known biometric application that identifies individuals by analyzing facial features. Other biometric techniques include fingerprint recognition, iris scanning, voice recognition, and gait analysis. Biometrics finds applications in security systems, access control, and identity verification in various industries.
  3. Object Recognition: Object recognition aims to identify and classify objects or specific patterns within images or videos. Deep learning techniques, particularly Convolutional Neural Networks (CNNs), have revolutionized object recognition by achieving high accuracy in detecting and categorizing objects. This application is used in autonomous vehicles, surveillance systems, robotics, and augmented reality, among others.
  4. Object Tracking: Object tracking focuses on following the movement of specific objects in videos or image sequences over time. It involves identifying and locating the object in each frame and linking them to create a trajectory. Object tracking has applications in video surveillance, activity recognition, visual analytics, and robotics.
  5. Medical Image Analysis: Medical Image Analysis involves the processing and interpretation of medical images like X-rays, MRIs, CT scans, and histopathological images. Computer vision algorithms assist medical professionals in detecting and diagnosing diseases, identifying anomalies, and segmenting organs or tumors. It plays a crucial role in medical diagnosis, treatment planning, and research in fields like radiology, pathology, and oncology.

Some specific medical image analysis tasks include:

  • Tumour Detection: Identifying and localizing tumors in medical images to aid in cancer diagnosis and treatment planning.
  • Image Segmentation: Dividing medical images into meaningful regions for precise analysis and measurement.
  • Disease Classification: Classifying medical images to determine the presence or severity of specific diseases.
  • Surgical Navigation: Assisting surgeons by providing real-time feedback and guidance during procedures using imaging data.

These computer vision applications continue to advance and find new use cases as technology evolves and our understanding of visual data improves. With ongoing research and development, computer vision is expected to have an even more significant impact across various industries in the future.

 Document Image Analysis

Document Image Analysis (DIA) is a computer vision application that focuses on processing and understanding documents in various formats, such as scanned papers, forms, and handwriting. The primary goal of DIA is to extract meaningful information from document images, enabling automated data extraction, searchability, and analysis. It involves several key tasks, including Optical Character Recognition (OCR), layout analysis, text extraction, and document classification.

 

Here's a detailed explanation of Document Image Analysis with examples:

 

Optical Character Recognition (OCR):

OCR is a crucial component of Document Image Analysis. It involves recognizing and converting text within an image into a machine-readable and editable format. OCR algorithms analyze the pixel patterns in the image to identify and recognize individual characters, words, or entire paragraphs. This enables the extraction of textual information from images, which can be further processed and utilized.

Example: Suppose you have a scanned image of a printed document. OCR algorithms can process the image and extract the text, making it possible to search, edit, or store the textual content in a digital format.

Layout Analysis:

Layout analysis is the process of understanding the structure and organization of the document. It involves identifying different components of the document, such as headers, footers, paragraphs, tables, and images. Layout analysis helps to segment the document into meaningful regions, which is essential for accurate text extraction and understanding the document's hierarchy.

Example: In a multi-page document, layout analysis can identify sections such as the title page, table of contents, chapters, and appendices, facilitating easy navigation and retrieval of specific information.

Text Extraction:

After performing OCR and layout analysis, the next step is text extraction. In this stage, the recognized text is extracted and organized in a way that preserves the original document's structure. This enables the computer to understand the content and context of the document.

Example: Consider an invoice document. Text extraction would involve capturing information such as the invoice number, date, billing details, and itemized list of products/services along with their corresponding prices.

Document Classification:

Document classification involves categorizing documents based on their content or type. This process is essential for efficiently managing and organizing large document repositories.

Example: A computer system can classify documents as invoices, contracts, reports, or letters based on their content and layout. This categorization allows for easy retrieval and management of documents based on their purpose.

 

How computer vision is used in biometrics

 

Computer vision is extensively used in biometrics to analyze and recognize unique physical or behavioral characteristics of individuals for identity verification and authentication purposes. Biometrics leverages computer vision algorithms to extract, and process features from biometric data, such as facial images, fingerprints, iris patterns, voice, and gait. These extracted features are then compared with stored templates to verify the identity of an individual. Here's how computer vision is used in some common biometric modalities:

  1. Facial Recognition: Facial recognition is one of the most widely used biometric modalities. Computer vision algorithms analyze facial features, such as the distance between the eyes, the shape of the nose, and the contours of the face. These features are converted into a mathematical representation known as a facial template. During the verification process, a live facial image is captured, and its template is compared with the template stored in the database. If the templates match within a certain threshold, the individual's identity is verified.
  2. Fingerprint Recognition: Computer vision techniques are employed to capture and process fingerprint images. The ridges and furrows on a fingerprint are analyzed, and unique patterns, known as minutiae points, are extracted. These minutiae points form a fingerprint template, which is used for matching during verification. Fingerprint recognition is widely used in various applications, such as unlocking smartphones, access control systems, and forensic investigations.
  3. Iris Recognition: Iris recognition involves analyzing the patterns in the colored part of the eye (the iris). Computer vision algorithms capture high-resolution images of the iris and extract distinctive features, such as radial furrows and crypts. These features are used to create an iris template, which is compared during authentication. Iris recognition is known for its high accuracy and is used in secure access control systems and border-crossing applications.
  4. Voice Recognition: In voice recognition, computer vision algorithms analyze the vocal tract's unique characteristics, including the shape of the larynx and mouth. The speech signal is transformed into a spectrogram, and relevant features are extracted for creating a voice template. During verification, a live voice sample is compared with the stored template to determine the speaker's identity. Voice recognition is used in applications like voice authentication for phone banking and voice assistants.
  5. Gait Analysis: Gait analysis involves studying an individual's walking pattern using computer vision techniques. Various body movements, such as the stride length, walking speed, and angles of the limbs, are analyzed to create a gait template. Gait recognition is often used in surveillance systems for identifying individuals from a distance, even when their faces are not visible.

Computer vision plays a crucial role in ensuring the accuracy and efficiency of biometric systems, making them an integral part of various applications, including security, access control, and identity verification. As computer vision technology continues to advance, biometric systems are becoming even more reliable and widely adopted across different industries.

Top of Form

 Computer Vision for Medical Image Analysis

Computer vision is a field of computer science that deals with the extraction of meaningful information from digital images or videos. Medical image analysis is the use of computer vision techniques to analyze medical images, such as X-rays, CT scans, and MRIs.

Computer vision for medical image analysis has a wide range of applications, including:

  • Disease detection and diagnosis: Computer vision algorithms can be used to detect and diagnose diseases in medical images. For example, computer vision algorithms have been used to detect cancer in mammograms, heart disease in cardiac CT scans, and pneumonia in chest X-rays.
  • Image segmentation: Image segmentation is the process of dividing an image into its constituent parts. Computer vision algorithms can be used to segment medical images into different tissues or organs. This can be used to help doctors identify and measure specific structures in an image.
  • Image registration: Image registration is the process of aligning two or more images of the same object. This can be used to track changes in an object over time or to compare images from different imaging modalities.
  • Surgery planning: Computer vision algorithms can be used to plan surgeries by creating 3D models of patient anatomy. This can help doctors to visualize the surgical procedure and to identify potential risks.
  • Surgical Navigation

Computer vision for medical image analysis is a rapidly growing field with the potential to revolutionize healthcare. As computer vision techniques become more sophisticated, they will be able to automate many of the tasks that are currently performed by radiologists and other medical professionals. This will free up doctors to focus on more complex cases and will help to improve the accuracy and efficiency of medical diagnosis.

Here are some specific examples of how computer vision is being used in medical image analysis today:

  • Google DeepMind has developed an algorithm that can detect diabetic retinopathy in eye scans with 90% accuracy. This is a significant improvement over the accuracy of human radiologists, who typically achieve around 80% accuracy.
  • The company Arterys has developed an app that uses computer vision to analyze MRI scans of the heart. The app can identify coronary artery disease with 95% accuracy, which is comparable to the accuracy of human cardiologists.
  • The company Zebra Medical Vision has developed an algorithm that can detect cancer in mammograms with 99% accuracy. This is a significant improvement over the accuracy of human radiologists, who typically achieve around 85% accuracy.

These are just a few examples of the many ways that computer vision is being used in medical image analysis today. As computer vision techniques continue to improve, we can expect to see even more applications of this technology in the future.

Face detection is the process of finding and locating faces in an image or video. Face detection algorithms typically identify faces by looking for certain facial features, such as the eyes, nose, and mouth. Once a face has been detected, the algorithm can then extract the face from the image or video.

Face recognition is the process of identifying a person's face from a database of known faces. Face recognition algorithms typically compare the features of a face in an image or video to the features of faces in the database. If the features match, the algorithm can then identify the person.

Face detection and face recognition are two important techniques that are used in a variety of applications, including:

  • Security: Face detection and face recognition can be used to identify people in security footage, such as at airports or in banks.
  • Access control: Face recognition can be used to control access to buildings or other restricted areas.
  • Biometric identification: Face recognition can be used to identify people for biometric identification purposes, such as for passports or driver's licenses.
  • Social media: Face recognition can be used to identify people in social media photos and videos.

Both face detection and face recognition are complex tasks that require a lot of data and computing power. However, the accuracy of these techniques has improved significantly in recent years. As a result, face detection and face recognition are becoming increasingly popular and are being used in a wider range of applications.

Here are some of the key differences between face detection and face recognition:

  • Face detection: Finds and locates faces in an image or video.
  • Face recognition: Identifies a person's face from a database of known faces.
  • Face detection: Can be used to identify faces in real-time.
  • Face recognition: Requires a database of known faces to identify a person.
  • Face detection: Can be used to identify faces in a variety of lighting conditions.
  • Face recognition: This may not be as accurate in low-light conditions.

 Radiosity: the physics of image formation, radiance, irradiance, brightness, color


Radiosity is a technique in computer graphics that models the way light is reflected and diffused between surfaces in a scene. It is a global illumination algorithm, which means that it takes into account the light that is reflected from all surfaces in the scene, not just the light that comes directly from the light sources. This makes radiosity more realistic than other rendering techniques, such as ray tracing, which only considers the direct light paths.

The physics of image formation is the study of how light interacts with objects and surfaces to create an image. It is a complex topic, but some of the key concepts include:

  • Radiance: The radiant power per unit area, per unit solid angle, per unit wavelength. Radiance is a measure of the amount of light that is emitted from a surface in a particular direction.
  • Irradiance: The radiant flux per unit area. Irradiance is a measure of the amount of light that is incident on a surface.
  • Brightness: The perceived intensity of light. Brightness is a subjective measure, and it can be affected by factors such as the size of the light source, the distance to the light source, and the reflectance of the surface.
  • Color: The perception of different wavelengths of light. Color is also a subjective measure, and it can be affected by factors such as the brightness of the light, the surrounding colors, and the individual's perception of color.

Radiosity is used to calculate the brightness and color of surfaces in a scene by taking into account the radiance and irradiance of all the surfaces in the scene. This is done by solving the radiosity equation, which is a complex mathematical equation that models the way light is reflected and diffused between surfaces.

Radiosity is a powerful technique for creating realistic images, but it can be computationally expensive. This is because it requires solving the radiosity equation for every surface in the scene. However, advances in computer hardware and software have made radiosity more affordable and accessible. As a result, radiosity is becoming increasingly popular in computer graphics applications, such as architectural visualization, product design, and video games.

Lab Syllabus

Comments

Popular posts from this blog

Deep Learning and its Applications

AIML for Data Science