Microsoft Research Blog

Microsoft Research

Webinar Series

Online lectures from Microsoft’s computer scientists

View All Webinars

Microsoft Research

Podcast

Ongoing conversations at the cutting edge of research

View All Episodes
  1. Figure 1: COMPASS is a general-purpose pretraining pipeline, which is trained on mulitmodal data, including RGB image, segmentation, depth and optical flow. The pretrained COMPASS model can be deployed to various downstream tasks of autonomous systems. In this work, we transfer COMPASS to drone navigation, car racing and visual odometry, which are deployed in very different environments and application scenarios.

    COMPASS: COntrastive Multimodal Pretraining for AutonomouS Systems

    Figure 1: COMPASS is a general-purpose pretraining pipeline, which is trained on multimodal data, including RGB images, depth and optical flow. The pretrained COMPASS model can be deployed on various downstream autonomous systems tasks. In this work, we test COMPASS on simulated drone navigation, car…
    February 23, 2022
  2. An illustration of the KEAR architecture represented by five panels side by side. The first contains an input question—“What is a treat that your dog will enjoy?”—and the answer choices “salad,” “petted,” “affection,” “bone,” and “lots of attention.” The second panel has three boxes, each representing retrieval from a specific knowledge source. A box labeled “Knowledge Graph” has a silhouette of a dog and underneath it and labeled “desires” a silhouette of a dog being petted; a heart representing “affection”; a bone; and clapping hands representing “lots of attention.” A box labeled “relevant questions” has the question “What do dogs like to eat?” and the accompanying answer “Bones.” A boxed labeled “dictionary” contains the definition of “bone”: “a composite material making up the skeleton of most vertebrates.” The third panel, labeled “concatenation with input,” contains the input question followed by “Dog, desires, bone. Dog, desires, lots of attention” followed by the relevant question and finally the dictionary definition of bone. In between each is a separation token [SEP]. The fourth panel is labeled “language model” and contains a quote box labeled “language services,” a cube labeled “model,” and left and right braces punctuation within a circle labeled “language understanding.” The fifth panel is labeled “output” and includes silhouettes of each of the five answer choices. The silhouette of the bone is highlighted in blue, representing the appropriate response.

    Azure AI milestone: Microsoft KEAR surpasses human performance on CommonsenseQA benchmark

    KEAR (Knowledgeable External Attention for commonsense Reasoning)—along with recent milestones in computer vision and neural text-to-speech—is part of a larger Azure AI mission to provide relevant, meaningful AI solutions and services that work better for people because they better capture how people learn and work—with improved vision, knowledge understanding,…
    December 20, 2021
  3. Collage of four images. 1) a VR haptic pivot device 2) Ashley Lorens of Microsoft Research 3) an image of tractor on a farm 4) image of Race and Technology lecture series speakers.

    Research at Microsoft 2021: Collaborating for real-world change

    Over the past 30 years, Microsoft Research has undergone a shift in how it approaches innovation, broadening its mission to include not only advancing the state of computing but also using technology to tackle some of the world’s most pressing challenges. That evolution has never…
    December 15, 2021
  4. a screen shot of a computer

    Azure AI milestone: New foundation model Florence v1.0 advances state of the art, topping popular computer vision leaderboards

    The Project Florence Team With the new computer vision foundation model Florence v1.0, the Project Florence team set the new state of the art on the popular leaderboards TextCaps Challenge 2021, nocaps, Kinetics-400/Kinetics-600 action classification, and OK-VQA Leaderboard.  Florence v1.0—along with recent milestones in Neural…
    December 14, 2021 by Project Florence Team