We are very excited to announce the first fully stable release of the Matrix protocol and specification across all APIs – as well as the Synapse 1.0 reference implementation which implements the full Matrix 1.0 API surface.
This means that after just over 5 years since the initial work on Matrix began, we are proud to have finally exited beta!! This is the conclusion of the work which we announced at FOSDEM 2019 when we cut the first stable release of the Server-Server API and began the Synapse 0.99 release series in anticipation of releasing a 1.0.
Now, before you get too excited, it’s critical to understand that Matrix 1.0 is all about providing a stable, self-consistent, self-contained and secure version of the standard which anyone should be able to use to independently implement production-grade Matrix clients, servers, bots and bridges etc. It does not mean that all planned or possible features in Matrix are now specified and implemented, but that the most important core of the protocol is a well-defined stable platform for everyone to build on.
On the Synapse side, our focus has been exclusively on ensuring that Synapse correctly implements Matrix 1.0, to provide a stable and secure basis for participating in Matrix without risk of room corruption or other nastinesses. However, we have deliberately not focused on performance or features in the 1.0 release – so I’m afraid that synapse’s RAM footprint will not have got significantly better, and your favourite long-awaited features (automatically defragmenting rooms with lots of forward extremities, configurable message retention, admin management web-interface etc) have not yet landed. In other words, this is the opposite of the Riot 1.0 release (where the entire app was redesigned and radically improved its performance and UX) – instead, we have adopted the mantra to make it work, make it work right, and then (finally) make it fast. You can read the full release notes here. It’s also worth looking at the full changelog through the Synapse 0.99 release series to see the massive amount of polishing that’s been going on here.
All this means that the main headline features which land in Matrix 1.0 are vitally important but relatively dry:
Using X.509 certificates to trust servers rather than perspective notaries, to simplify and improve server-side trust. This is a breaking change across Matrix, and we’ve given the community several months now to ensure their homeservers run a valid TLS certificate. See MSC1711 for full details, and the 2 week warning we gave. As of ~9am UTC today, the matrix.org homeserver is running Synapse 1.0 and enforcing valid TLS certificates – the transition has begun (and so far we haven’t spotted any major breakage :). Thank you to everyone who got ready in advance!
Using .well-known URIs to discover servers, in case you can’t get a valid TLS certificate for your server’s domain.
Switching to room version 4 by default for creating new rooms. This fixes the most important defects that the core room algorithm has historically encountered, particularly:
…and lots and lots and lots of bugfixes and spec omission fixes.
That said, there is a lot of really exciting stuff in flight right now which sadly didn’t stabilise in time for Matrix 1.0, but will be landing as fast as we can finalise it now that 1.0 is at last out the door. This includes:
Editable messages! (These are in Synapse 1.0 and Riot already, but still stabilising so not enabled by default)
Reactions! (Similarly these are in develop)
Threading!! (We’ve planted the seeds for this in the new ‘aggregations’ support which powers edits & reactions – but full thread support is still a bit further out).
Cross-signed verification for end-to-end encryption (This is on a branch, but due to land any day now). We’ve also held off merging E2E backups into the Matrix 1.0 spec until cross-signing lands, given it may change the backup behaviour a bit. Once this is done, we can seriously talk about turning on E2E by default everywhere.
Live-tracking of room statistics and state in Synapse! (This is in Synapse 1.0 already if you check out the new room_stats and room_state tables, but we need to provide a nice admin interface for it).
Support for smaller footprint homeservers by reducing memory usage and stopping them from joining overly complex rooms.
Then stuff which we haven’t yet started, but is now unlocked by the 1.0 release:
Fixing extremities build-up (and so massively improving performance)
Rewriting Communities. Groups/Communities deliberately didn’t land in Matrix 1.0 as the current implementation has issues we want to fix first. MSC1772 has the details.
Rewritten room directory using the new room stats/state tables to be super-speedy.
Just to give a quick taster of the shape of things to come, here’s RiotX/Android, the all-new Riot client for Android, showing off Edits & Reactions in the wild…
…and here’s a screenshot of the final test jig for cross-signing devices in end-to-end encryption, so you will never have to manually verify new devices for a trusted user ever again! We demoed a *very* early version of this at FOSDEM, but this here is the testing harness for real deal, after several iterations of the spec and implementation to nail down the model. + means the device/user’s cross-signing key is trusted, T means it’s TOFU:
So, there you have it – welcome to Matrix 1.0, and we look forward to our backlog of feature work now landing!
Massive massive thanks to everyone who has stuck with the project over the years and helped support and grow Matrix – little did we think back in May 2014 that it’d take us this long to exit beta, but hopefully you’ll agree that it’s been worth it 🙂
Talking of which, we were looking through the photos we took from the first ever session hacking on Matrix back in May 2014…
…suffice it to say that of the architectural options, we went with #3 in the end…
…and that nowadays we actually know how power levels work, in excruciating and (hopefully) well-specified detail 🙂
There has been an absolutely enormous amount of work to pull Matrix 1.0 together – both on the spec side (thanks to the Spec Core Team for corralling proposals, and everyone who’s contributed proposals, and particularly to Travis for editing it all) and the implementation side (thanks to the whole Synapse team for the tedious task of cleaning up everything that was needed for 1.0). And of course, huge thanks go to everyone who has been helping test and debug the Synapse 1.0 release candidates, or just supporting the project to get to this point 🙂
The Matrix.org Foundation
Finally, as promised, alongside Matrix 1.0, we are very happy to announce the official launch of the finalised Matrix.org Foundation!
As of today the Foundation is finalised and operational, and all the assets for Matrix.org have been transferred from New Vector (the startup we formed in 2017 to hire the core Matrix team). In fact you may already have seen Matrix.org Foundation notices popping up all over the Matrix codebase (as all of New Vector’s work on the public Matrix codebase for the forseeable is being assigned to the Matrix.org Foundation).
Most importantly, we’re excited to introduce the Guardians of the Matrix.org Foundation. The Guardians are the legal directors of the non-profit Foundation, and are responsible for ensuring that the Foundation (and by extension the Spec Core Team) keeps on mission and neutrally protects the development of Matrix. Guardians are typically independent of the commercial Matrix ecosystem and may even not be members of today’s Matrix community, but are deeply aligned with the mission of the project. Guardians are selected to be respected and trusted by the wider community to uphold the guiding principles of the Foundation and keep the other Guardians honest.
We have started the Foundation with five Guardians – two being the original founders of the Matrix project (Matthew and Amandine) and three being entirely independent, thus ensuring the original Matrix team forms a minority which can be kept in check by the rest of the Guardians. The new Guardians are:
Prof. Jon Crowcroft – Marconi Professor of Communications Systems in the Computer Lab at the University of Cambridge and the Turing Institute. Jon is a pioneer in the field of decentralised communication, and a fellow of the Royal Society, the ACM, the British Computer Society, the Institution of Engineering and Technology, the Royal Academy of Engineering and the Institute of Electrical and Electronics Engineers.
Jon is a global expert in decentralisation and data privacy, and is excellently placed to help ensure Matrix stays true to its ideals.
Ross Schulman – Ross is a senior counsel and senior policy technologist at New America’s Open Technology Institute, where he focuses on internet measurement, emerging technologies, surveillance, and decentralization. Prior to joining OTI, Ross worked for Google.
Ross brings a unique perspective as a tech- and decentralisation-savvy lawyer to the Foundation, as well as being one of the first non-developers in the Matrix community to run his own homeserver. Ross has been known to walk around Mozfest clutching a battery-powered Synapse in a box, promoting decentralised communication for all.
Dr. Jutta Steiner – As co-founder and CEO of Parity Technologies, Jutta is dedicated to building a better internet – Web 3.0 – where users’ privacy & control come first. Parity Technologies is a leader in the blockchain space – known to many as the creator of one of the most popular Ethereum clients, it is also the creator of two ambitious new blockchain technlogies, Polkadot and Substrate, that make it easier to experiment and innovate on scalability, encryption and governance.
Parity has been pioneering Matrix enterprise use since the moment they decided to rely on Matrix for their internal and external communication back in 2016, and now run their own high-volume deployment, with end-to-end encryption enabled by default. Jutta represents organisations who are professionally dependent on Matrix day-to-day, as well as bringing her unique experiences around decentralisation and ensuring that Web 3.0 will be a fair web for all.
We’d like to offer a very warm welcome to the new Guardians, and thank them profusely for giving up their time to join the Foundation and help ensure Matrix stays on course for the years to come.
For the full update on the Foundation, please check out the new website content at https://matrix.org/foundation which should tell you everything you could possibly want to know about the Foundation, the Guardians, the Foundation’s legal Articles of Association, and the day-to-day Rules which define the Open Governance process.
Matrix 1.0 has been a bit of an epic to release, but puts us on a much stronger footing for the future.
However, it’s very unlikely that we’d have made it this far if most of the core dev team wasn’t able to work on Matrix as their day job. Right now we are actively looking for large-scale donations to the Matrix.org Foundation (and/or investment in New Vector) to ensure that the team can maintain as tight a focus on core Matrix work as possible, and to ensure the project realises its full potential. While Matrix is growing faster than ever, this perversely means we have more and more distractions – whether that’s keeping the Matrix.org server safe and operational, or handling support requests from the community, or helping new members of the ecosystem get up and running. If you would like Matrix to succeed, please get in touch if you’d like to sponsor work, prioritise features, get support contracts, or otherwise support the project. We’re particularly interested in sponsorship around decentralised reputation work (e.g. publishing a global room directory which users can filter based on their preferences).
Finally, huge thanks to everyone who has continued to support us through thick and thin on Patreon, Liberapay or other platforms. Every little helps here, both in terms of practically keeping the lights on, and also inspiring larger donations & financial support.
So: thank you once again for flying Matrix. We hope you enjoy 1.0, and we look forward to everything else landing on the horizon!
At Uber, we have hundreds of internal web applications used by developers, product managers, and operations teams—essentially everyone at the company. Since all web applications work differently, it puts additional overhead on our employees to learn how to interact with them most effectively. This can lead to engineers spending a lot of time and effort reinventing the wheel instead of leveraging a universal design system across the company.
To solve these issues, Uber assembled a dedicated design and engineering team to come up with a universal system, which resulted in the Base Web design system. Open sourced in 2018, Base Web is a React component library implementing the Base design language that acts as a device-agnostic foundation for quickly and easily creating web applications.
What is a design system?
A design system is a set of reusable components that, in combination with a set of rules and design tokens (referred to as entities), stores visual design information, like colors or spacing, and enables you to build consistent and accessible applications quickly.
A design system serves as a common language between teams of engineers, designers, and product managers, making it easier or them to work together. It fosters productivity through this shared understanding of building blocks. Design systems also help onboard new engineers and designers—they can quickly go through all the possible components and design tokens used by a given engineering organization.
Introducing Base Web
Base Web is a foundation for initiating, evolving, and unifying web applications. It’s an open source toolkit of React components and utilities that align with the Base Design System–essentially, the designs translated into code. The project is engineered to be reliable, accessible, and extensively customizable.
By contributing our code publicly on GitHub, we are holding ourselves to a high standard and are dedicated to provide an open communication channel for our users where they can suggest improvements and contributions back to Base Web. Each React component is screened by visual regression services on each commit to ensure pixel-perfect layout. We also test updates end-to-end using Puppeteer, a tool that provides high level API control over the Chrome web browser. By leveraging both of these testing strategies, we can rest assured that code changes are meeting product requirements and not causing bugs.
User accessibility is incredibly important to Uber, and Base Web does a lot to ensure developers are given the tools to build products that work for all website visitors. For instance, drag and drop Lists are notoriously difficult to implement because browsers provide little help when developers have to build drag and drop interactions. Developers using Base Web have the peace of mind that keyboard navigation is reliable and works well with screen readers. To ensure additional accessibility, Base Web leverages Styletron, which generates atomic styling so that web applications can download as little content as possible. Styletron is instrumental in optimizing Base Web for users on mobile devices and with poor network connectivity.
To account for the diversity of web applications, we built Base Web to be as customizable as possible. In many ways, you can think of the project as a ‘base’ on which you can easily create new design systems. The project provides a top level entry point to theme all design tokens, including colors, sizing, and typography. Not only can developers easily edit the visual elements of their web applications, but Base Web also incorporates an interface to override functionality.
The overrides pattern
Based on our team’s previous experience working with component libraries, one of our main goals with Base Web was to create web development software that would make it easy to reuse components. To accomplish this, we worked with Uber’s web engineers to detect the main pain points they had to deal with in their day-to-day engineering work. It was clear that having more control over a component was the most desired requirement.
Based on our research, we determined that the main parts of the React components engineers often needed access to were style customizations, the ability to pass through some custom properties to any element in a composable component, like Accessible Rich Internet Applications (ARIA), and the ability to modify the rendering of a component. As a result, we introduced a unified overrides API in the Base Web components.
Some of the benefits we experienced with the proposed overrides pattern are:
No top-level properties API overload
No extra properties proxying inconsistently across the composable components
Easily replaceable presentational components
Let’s take a look at what Base Web’s overrides API looks like in detail:
As depicted above, we provide an identifier for every underlying element in a component so it can be targeted through the overrides property. In the example above, there are two elements rendered and they are exposed as Root and Option in the overrides API.
For every element or component, we provide a way to pass in extra properties and styles, or completely replace the component. Extra properties are passed in as an object that is spread onto the JSX element taking precedence over other properties applied by default. Style overrides can be passed in two ways, as an object or as a function that accepts $theme and some shared component state properties, and returns a style object. The style overrides passed in are deep-merged with the default element’s styles.
We also provide a way to completely replace an underlying element or component by passing the replacements as a component value for the targeted identifier (the Option override is depicted in the example above). This can be useful if you’d like to add or change the functionality of a given subcomponent.
Internally, we created helpers to merge the overrides into the default or a custom styled element; check out our documentation to learn how to use them for your own projects.
Get started with Base Web
Open sourced in 2018 to enable others to experience the benefits of this solution, Base Web is now used across Uber, ensuring a seamless development experience across our web applications.
To get started with Base Web, head to our documentation site, and read through the Getting Started section, which covers:
At Uber, real-time analytics allow us to attain business insights and operational efficiency, enabling us to make data-driven decisions to improve experiences on the Uber platform. For example, our operations team relies on data to monitor the market health and spot potential issues on our platform; software powered by machine learning models leverages data to predict rider supply and driver demand; and data scientists use data to improve machine learning models for better forecasting.
In the past, we have utilized many third-party database solutions for real-time analytics, but none were able to simultaneously address all of our functional, scalability, performance, cost, and operational requirements.
Released in November 2018, AresDB is an open source, real-time analytics engine that leverages an unconventional power source, graphics processing units (GPUs), to enable our analytics to grow at scale. An emerging tool for real-time analytics, GPU technology has advanced significantly over the years, making it a perfect fit for real-time computation and data processing in parallel.
In the following sections, we describe the design of AresDB and how this powerful solution for real-time analytics has allowed us to more performatively and efficiently unify, simplify, and improve Uber’s real-time analytics database solutions. After reading this article, we hope you try out AresDB for your own projects and find the tool useful your own analytics needs, too!
Real-time analytics applications at Uber
Data analytics are crucial to the success of Uber’s business. Among other functions, these analytics are used to:
Make ad hoc queries to diagnose and troubleshoot business operations issues
We can summarize these functions into categories with different requirements as follows:
Ad hoc Queries
Dashboards and decision systems leverage real-time analytical systems to make similar queries over relatively small, yet highly valuable, subsets of data (with maximum data freshness) at high QPS and low latency.
The need for another analytical engine
The most common problem that real-time analytics solves at Uber is how to compute time series aggregates, calculations that give us insight into the user experience so we can improve our services accordingly. With these computations, we can request metrics by specific dimensions (such as day, hour, city ID, and trip status) over a time range on arbitrarily filtered (or sometimes joined) data. Over the years, Uber has deployed multiple solutions to solve this problem in different ways.
Some of the third-party solutions we’ve used for solving this type of problem include:
Apache Pinot, an open source distributed analytical database written in Java, can be leveraged for large-scale data analytics. Pinot employs a lambda architecture internally to query batch and real-time data in columnar storage, uses inverted bitmap index for filtering, and relies on star-tree for aggregate result caching. However, it does not support key-based deduplication, upsert, joins, and advanced query features such as geo-spatial-filtering. In addition, being a JVM-based database, query execution on Pinot runs at a higher cost in terms of memory usage.
Elasticsearch is used at Uber for a variety of streaming analytics needs. It was built on Apache Lucene for full-text keyword search that stores documents and inverted index. It has been widely adopted and extended to also support aggregates. The inverted index enables filtering, yet it is not optimized for time range-based storage and filtering. It stores records as JSON documents, imposing additional storage and query access overhead. Like Pinot, Elasticsearch is a JVM-based database, and as such, does not support joins and its query execution runs at a higher memory cost.
While these technologies have strengths of their own, they lacked crucial functionalities for our use case. We needed a unified, simplified, and optimized solution, and thought outside-of-the-box (or rather, inside the GPU) to reach a solution.
Leveraging GPUs for real-time analytics
To render realistic views of images at a high frame rate, GPUs process a massive amount of geometries and pixels in parallel at high speed. While the clock-rate increase for processing units has plateaued over the past few years, the number of transistors on a chip has only increased per Moore’s law. As a result, GPU computation speeds, measured in Gigaflops per second (GFLOP/s), are rapidly increasing. Figure 1, below, depicts the theoretical GFLOP/s trend comparing NVIDIA GPUs and Intel CPUs over the years:
When designing our real-time analytics querying engine, integrating GPU processing was a natural fit. At Uber, the typical real-time analytical query processes a few days of data with millions to billions of records and then filters and aggregates them in a short amount of time. This computation task fits perfectly into the parallel processing model of general purpose GPUs because they:
Process data in parallel very quickly.
Deliver greater computation throughput (GFLOPS/s), making them a good fit for heavy computation tasks (per unit data) that can be parallelized.
Offer greater compute-to-storage (ALU to GPU global memory) data access throughput (not latency) compared to central processing units (CPUs), making them ideal for processing I/O (memory)-bound parallel tasks that require a massive amount of data.
Once we settled on using a GPU-based analytical database, we assessed a few existing analytics solutions that leveraged GPUs for our needs:
Kinetica, a GPU-based analytics engine, was initially marketed towards U.S. military and intelligence applications in 2009. While it demonstrates the great potential of GPU technology in analytics, we found many key features missingfor our use case, including schema alteration, partial insertion or updates, data compression, column-level memory/disk retention configuration, and join by geospatial relationships.
OmniSci, an open source, SQL-based query engine, seemed like a promising option, but as we evaluated the product, we realized that it did not have critical features for Uber’s use case, such as deduplication. While OminiSciopen sourced their project in 2017, after some analysis of their C++-based solution, we concluded that neither contributing back nor forking their codebase was viable.
GPU-based real-time analytics engines, including GPUQP, CoGaDB, GPUDB, Ocelot, OmniDB, and Virginian, are frequently used by academic institutions. However, given their academic purpose, these solutions focus on developing algorithms and designing proof of concepts as opposed to handling real-world production scenarios. For this reason, we discounted them for our scope and scale.
Overall, these engines demonstrate the great advantage and potential of data processing using GPU technology, and they inspired us to build our own GPU-based, real-time analytics solution tailored to Uber’s needs. With these concepts in mind, we built and open sourced AresDB.
AresDB architecture overview
At a high level, AresDB stores most of its data in host memory (RAM that is connected to CPUs), handling data ingestion using CPUs and data recovery via disks. At query time, AresDB transfers data from host memory to GPU memory for parallel processing on GPU. As shown in Figure 2, below, AresDB consists of a memory store, a meta datastore, and a disk store:
Unlike most relational database management systems (RDBMSs), there is no database or schema scope in AresDB. All tables belong to the same scope in the same AresDB cluster/instance, enabling users to refer to them directly. Users store their data as fact tables and dimension tables.
A fact table stores an infinite stream of time series events. Users use a fact table to store events/facts that are happening in real time, and each event is associated with an event time, with the table often queried by the event time. An example of the type of information stored by fact tables are trips, where each trip is an event and the trip request time is often designated as the event time. In case an event has multiple timestamps associated with it, only one timestamp is designated as the time of the event displayed in the fact table.
A dimension table stores current properties for entities (including cities, clients, and drivers). For example, users can store city information, such as city name, time zone, and country, in a dimension table. Compared to fact tables, which grow infinitely over time, dimension tables are always bounded by size (e.g., for Uber, the cities table is bounded by the actual number of cities in the world). Dimension tables do not need a special time column.
Table below details the current data types supported in AresDB:
With AresDB, strings are converted to enumerated types (enums) automatically before they enter the database for better storage and query efficiency. This allows case-sensitive equality checking, but does not support advanced operations such as concatenation, substrings, globs, and regex matching. We intend to add full string support in the future.
AresDB’s architecture supports the following features:
Column-based storage with compression for storage efficiency (less memory usage in terms of bytes to store data) and query efficiency (less data transfer from CPU memory to GPU memory during querying)
Real-time upsert with primary key deduplication forhigh data accuracy and near real-time data freshness within seconds
GPU powered query processing for highly parallelized data processing powered by GPU, rendering low query latency (sub-seconds to seconds)
AresDB stores all data in a columnar format. The values of each column are stored as a columnar value vector. Validity/nullness of the values in each columnis stored in a separate null vector, with the validity of each value represented by one bit.
AresDBstores uncompressed and unsorted columnar data (live vectors) in a live store.Data records in a live store are partitioned into (live) batches of configured capacity. New batches are created at ingestion, while old batches are purged after records are archived. A primary key index is used to locate the records for deduplication and updates. Figure 3, below, demonstrates how we organize live records and use a primary key value to locate them:
The values of each column within a batch are stored as a columnar vector. Validity/nullness of the values in each value vector is stored as a separate null vector, with the validity of each value represented by one bit. In Figure 4, below, we offer an an example with five values for a city_id column:
AresDB also stores mature, sorted, and compressed columnar data (archive vectors) in an archive store via fact tables. Records in archive store are also partitioned into batches. Unlike live batches,anarchive batch contains records of a particular Universal Time Coordinated (UTC) day. Archive batch uses the number of days since Unix Epoch as its batch ID.
Records are kept sorted according to a user configured column sort order. As depicted in Figure 5, below, we sort by city_id column first, followed by a status column:
The goal of configuring the user-configured column sort order is to:
Maximize compression effects by sorting low cardinality columns earlier. Maximized compression increases storage efficiency (less bytes needed for storing data) and query efficiency (less bytes transferred from CPU to GPU memory).
Allow cheap range-based prefiltering for common equi-filters such as city_id=12. Prefiltering enables us to minimize the bytes needed to be transferred from CPU memory to GPU memory, thereby maximizing query efficiency.
A column is compressed only if it appears in the user-configured sort order. We do not attempt to compress high cardinality columns because the amount of storage saved by compressing high cardinality columns is negligible.
After sorting, the data for each qualified column is compressed using a variation of run-length encoding. In addition to the value vector and null vector, we introduce the count vector to represent a repetition of the same value.
Real-time ingestion with upsert support
Clients ingest data through the ingestion HTTP API by posting an upsert batch. The upsert batch is a custom, serialized binary format that minimizes space overhead while still keeping the data randomly accessible.
When AresDB receives an upsert batch for ingestion, it first writes the upsert batch to redo logs for recovery. After an upsert batch is appended to the end of the redo log, AresDB then identifies and skips late records on fact tables for ingestion into the live store. A record is considered “late” if its event time is older than the archived cut-off event time. For records not considered “late,” AresDB uses the primary key index to locate the batch within live store where they should be applied to. As depicted in Figure 6, below, brand new records (not seen before based on the primary key value) will be applied to the empty space while existing records will be updated directly:
At ingestion time, records are either appended/updated in the live store or appended to a backfill queue waiting to be placed in the archive store.
We periodically run a scheduled process, referred to as archiving, on live store records to merge the new records (records that have never been archived before) into the archive store. Archiving will only process records in the live store with their event time falling into the range of the old cut-off time (the cut-off time from last archiving process) and new cut-off time (the new cut-off time based on the archive delay setting in the table schema).
The records’ times of event will be used to determine which archive batch the records should be merged into as we batch archived data into daily batches. Archiving does not require primary key value index deduplication during merging since only records between the old cut-off and new cut-off ranges will be archived.
Figure 7, below, depicts the timeline based on the given record’s event time:
In this scenario, the archiving interval is the time between two archiving runs, while the archiving delay is the duration after the event time but before an event can be archived. Both are defined in AresDB’s table schema configurations.
As shown in Figure 7, above, old records (with event time older than the archiving cut-off) for fact tables are appended to the backfill queue and eventually handled by the backfill process. This process is also triggered by the time or size of the backfill queue onces it reaches its threshold. Compared to ingestion by the live store, backfilling is asynchronous and relatively more expensive in terms of CPU and memory resources. Backfill is used in the following scenarios:
Handling occasional very late arrivals
Manual fixing of historical data from upstream
Populating historical data for recently added columns
Unlike archiving, backfilling is idempotent and requires primary key value-based deduplication. The data being backfilled will eventually be visible to queries.
The backfill queue is maintained in memory with a pre-configured size, and, during massive backfill loads, the client will be blocked from proceeding before the queue is cleared by a backfill run.
With the current implementation, the user will need to use Ares Query Language (AQL), created by Uber to run queries against AresDB. AQL is an effective time series analytical query language and does not follow the standard SQL syntax of SELECT FROM WHERE GROUP BY like other SQL-like languages. Instead, AQL is specified in structured fields and can be carried with JSON, YAML, and Go objects. For instance, instead of SELECT count(*) FROM trips GROUP BY city_id WHERE status = ‘completed’ AND request_at >= 1512000000, the equivalent AQL in JSON is written as:
In JSON-format, AQL provides better programmatic query experience than SQL for dashboard and decision system developers, because it allows them to easily compose and manipulate queries using code without worrying about issues like SQL injection. It serves as the universal query format on typical architectures from web browsers, front-end servers, and back-end servers, all the way back into the database (AresDB). In addition, AQL provides handy syntactic sugar for time filtering and bucketization, with native time zone support. The language also supports features likeimplicit sub-queriesto avoid common query mistakes and makes query analysis and rewriting easy for back-end developers.
Despite the various benefits AQL provides, we are fully aware that most engineers are more familiar with SQL. Exposing a SQL interface for querying is one of the next steps that we will look into to enhance the AresDB user experience.
We depict the AQL query execution flow in Figure 8, below:
An AQL query is compiled intointernal query context. Expressions in filters, dimensions, and measurements are parsed into abstract syntax trees (AST) for later processing via GPU.
AresDButilizes pre-filtersto cheaply filter archived data before sending them to a GPU for parallel processing. Since archived data is sorted according to a configured column order, some filters may be able to utilize this sorted order by applying binary search to locate the corresponding matching range. In particular, equi-filters on all of the first X-sorted columns and optionally range filter on sorted X+1 columns can be processed as pre-filters, as depicted in Figure 9, below:
After prefiltering, only the green values (satisfying filter condition) need to be pushed to the GPU for parallel processing.Input data is fed to the GPU and executed there one batch at a time. This includes both live batches and archive batches.
AresDB utilizes CUDA streams for pipelined data feeding and execution. Two streams are used alternately on each query for processing in two overlapping stages. In Figure 10, below, we offer a timeline illustration of this process:
For simplicity, AresDB utilizes the Thrust library to implement query execution procedures, which offers fine-tuned parallel algorithm building blocks for quick implementation in the current query engine.
In Thrust, input and output vector data is accessed using random access iterators. Each GPU thread seeks the input iterators to its workload position, reads the values and performs the computation, and then writes the result to the corresponding position on the output iterator.
AresDB follows the one-operator-per-kernel (OOPK) model for evaluating expressions.
Figure 11, below, demonstrates this procedure on an example AST, generated from a dimension expression request_at – request_at % 86400 in the query compilation stage:
In the OOPK model, the AresDB query engine traverses each leaf node of the AST tree and returns an iterator for its parent node. In cases where the root node is also a leaf, the root action is taken directly on the input iterator.
At each non-root non-leaf node (modulo operation in this example), a temporary scratch space vector is allocated to store the intermediate result produced from request_at % 86400 expression. Leveraging Thrust, a kernel function is launched to compute the output for this operator on GPU. The results are stored in the scratch space iterator.
At the root node, a kernel function is launched in the same manner as a non-root, non-leaf node. Different output actions are taken based on the expression type, detailed below:
Filter action to reduce the cardinality of input vectors
Write dimension output to the dimension vector for later aggregation
Write measure output to the measure vector for later aggregation
After expression evaluation, sorting and reduction are executed to conduct final aggregation. In both sorting and reduction operations, we use the values of the dimension vector as the key values of sorting and reduction, and the values of the measure vector as the values to aggregate on. In this way, rows with same dimension values will be grouped together and aggregated. Figure 12, below, depicts this sorting and reduction process:
AresDB also supports the following advanced query features:
Join: currently AresDB supports hash join from fact table to dimension table
Geo Intersect: currently AresDB supports only intersect operations between GeoPoint and GeoShape
As an in-memory-based database, AresDB needs to manage the following types of memory usage:
Live Store Vectors (live store columnar data)
Archive Store Vectors (archive store columnar data)
Managed (Load and eviction)
Primary Key Index (hash table for record deduplication)
Backfill Queue (store “late” arrival data waiting for backfill)
Archive / Backfill Process Temporary Storage (Temporary memory allocated during the Archive and Backfill process)
Ingestion / Query Temporary Storage;
Golang and C
Statically Configured Estimate
When AresDB goes into production, it leverages a configured total memory budget. This budget is shared by all six memory types and should also leave enough room for the operating system and other processes. This budget also covers a statically configured overhead estimation, live data storage monitored by the server, and archived data that the server can decide to load and evict depending on the remaining memory budget.
Figure 13, below, depicts the AresDB host memory model:
AresDB allows users to configure pre-loading days and priority at the column level for fact tables, and only pre-loads archive data within pre-loading days. Non-preloaded data is loaded into memory from disk on demand. Once full, AresDB also evicts archived data from the host memory. AresDB’s eviction policies are based on the number of preloading days, column priorities, the day of the batch, and the column size.
AresDB also manages multiple GPU devices and models device resources as GPU threads and device memory, tracking GPU memory usage as processing queries. AresDB manages GPU devices through device manager, which models GPU device resources in two dimensions–GPU threads and device memory–and tracks the usage while processing queries. After query compilation, AresDB enables users to estimate the amount of resources needed to execute the query. Device memory requirements must be satisfied before a query is allowed to start; the query must wait to run if there is not enough memory at that moment on any device. Currently, AresDB can run either one or several queries on the same GPU device simultaneously, so long as the device satisfies all resource requirements.
In the current implementation, AresDB does not cache input data in device memory for reuse across multiple queries. AresDB targets supporting queries on datasets that are constantly updated in real time and hard to cache correctly. We intend to implement a data caching functionality GPU memory in future iterations of AresDB, a step that will help optimize query performance.
Use Case: Uber’s Summary Dashboard
At Uber, we use AresDB to build dashboards for extracting real-time business insights. AresDB plays the role of storing fresh raw events with constant updates and computing crucial metrics against them in sub seconds using GPU power with low cost so that users can utilize the dashboards interactively. For example, anonymized trip data, which has a long lifespan in the datastore, is updated by multiple services, including our dispatch, payments, and ratings systems. To utilize trips data effectively, users will slice and dice the data into different dimensions to get insights for real-time decisions.
Leveraging AresDB, Uber’s Summary Dashboard is a widely used analytics dashboard leveraged by teams across the company to retrieve relevant product metrics and respond in real time to improve user experience.
To build the mock-up dashboard, above, we modeled the following tables:
Trips (fact table)
Cities (dimension table)
Table schemas in AresDB
To create the two modeled tables described above, we will first need to create the tables in AresDB in the following schemas:
As described in schema, trips tables are created as fact tables, representing trips events that are happening in real time, while cities tables are created as dimension tables, storing information about actual cities.
After tables are created, users may leverage the AresDB client library to ingest data from an event bus such as Apache Kafka, or streaming or batch processing platforms such as Apache Flink or Apache Spark.
Sample queries against AresDB
In the mock-up dashboards, we choose two metrics as examples, total trip fare and active drivers. In the dashboard, users can filter the city for the metrics, eg. San Francisco. To draw the time series for these two metrics for the last 24 hours shown in the dashboards, we can run the following queries in AQL:
Total trips fare in San Francisco in the last 24 hours group by hours
Active drivers in San Francisco in the last 24 hours group by hours
In the above example, we demonstrated how to leverage AresDB to ingest raw events happening in real-time within seconds and issue arbitrary user queries against the data right away to compute metrics in sub seconds. AresDB helps engineers to easily build data products that extract metrics crucial to businesses that requires real-time insights for human or machine decisions.
AresDB is widely used at Uber to power our real-time data analytics dashboards, enabling us to make data-driven decisions at scale about myriad aspects of our business. By open sourcing this tool, we hope others in the community can leverage AresDB for their own analytics.
In the future, we intend to enhance the project with the following features:
Distributed design: We are working on building out the distributed design of AresDB, including replication, sharding management, and schema management to improve its scalability and reduce operational costs.
Developer support and tooling: Since open sourcing AresDB in November 2018, we have been working on building more intuitive tooling, refactoring code structures, and enriching documentation to improve the onboarding experience, enabling developers to quickly integrate AresDB to their analytics stack.
Expanding feature set:We also plan to expand our query feature set to include functionality such as window functions and nested loop joins, thereby allowing the tool to support more use cases.
Query engine optimization: We will also be looking into developing more advanced ways to optimize query performance, such as Low Level Virtual Machine (LLVM) and GPU memory caching.
AresDB is open sourced under the Apache License. We encourage you to try out AresDB and join our community.
If building large-scale, real-time data analytics technologies interests you, consider applying for a role on our team.
A huge thanks to the rest of Uber Real-time Streaming Analytics team: Steven Chen, David Chen, Xiang Fu, Shengyue Ji, Jian Shen, Jeremy Shi, Ze Wang, and David Wang.
The Microsoft Devices Team is excited to announce Project Mu, the open-source release of the Unified Extensible Firmware Interface (UEFI) core leveraged by Microsoft products including both Surface and the latest releases of Hyper-V. UEFI is system software that initializes hardware during the boot process and provides services for the operating system to load. Project Mu contributes numerous UEFI features targeted at modern Windows based PCs. It also demonstrates a code structure and development process for efficiently building scalable and serviceable firmware. These enhancements allow Project Mu devices to support Firmware as a Service (FaaS). Similar to Windows as a Service, Firmware as a Service optimizes UEFI and other system firmware for timely quality patches that keep firmware up to date and enables efficient development of post-launch features.
When first enabling FaaS on Surface, we learned that the open source UEFI implementation TianoCore was not optimized for rapid servicing across multiple product lines. We spent several product cycles iterating on FaaS, and have now published the result as free, open source Project Mu! We are hopeful that the ecosystem will incorporate these ideas and code, as well as provide us with ongoing feedback to continue improvements.
Hexbyte Hacker News Computers Project Mu includes:
A code structure & development process optimized for Firmware as a Service
An on-screen keyboard
Secure management of UEFI settings
Improved security by removing unnecessary legacy code, a practice known as attack surface reduction
Modern BIOS menu examples
Numerous tests & tools to analyze and optimize UEFI quality.
We look forward to engagements with the ecosystem as we continue to evolve and improve Project Mu to our mutual benefit!
In 1864 British computer pioneer Charles Babbage described the first key-value store. It was meant to be part of his Analytical Engine. Sadly, the Analytical Engine, which would have been the first programmable computer, was never built. But Babbage lays out clearly the design for his key-value store in his autobiography. He imagined a read-only store implemented as punched cards. He referred to these as Tables:
I explained that the Tables to be used must, of course, be computed and punched on cards by the machine, in which case they would undoubtedly be correct. I then added that when the machine wanted a tabular number, say the logarithm of a given number, that it would ring a bell and then stop itself. On this, the attendant would look at a certain part of the machine, and find that it wanted the logarithm of a given number, say of 2303. The attendant would then go to the drawer containing the pasteboard cards representing its table of logarithms. From amongst these he would take the required logarithmic card, and place it in the machine.
Punched card illustration from Babbage’s autobiography showing an integer key (2303) and value representing the decimal part of log10(2303) (0.3622939)
Upon this the engine would first ascertain whether the assistant had or had not given him the correct logarithm of the number; if so, it would use it and continue its work. But if the engine found the attendant had given him a wrong logarithm, it would then ring a louder bell, and stop itself. On the attendant again examining the engine, he would observe the words, “Wrong tabular number,” and then discover that he really had given the wrong logarithm, and of course he would have to replace it by the right one.
So, a key-value store (in this case mapping integers to their logarithm) implemented as external storage on punched cards with a human assistant as data bus. We’ve come a long way but key-value stores are as useful today as they were 150 years ago.
Today we’re announcing a native key-value store for Cloudflare Workers. We’re calling this Workers KV and this functionality is just the start of a sequence of announcements around storage and databases on the edge.
Values are written into Workers KV via the standard Cloudflare API and they are available within seconds at every one of Cloudflare’s 150+ global PoPs. Stephen and Zack’s technical post goes into more detail about how to use Workers KV. The values written are encrypted while at rest, in transit, and on local disk; they are only decrypted as needed.
Values can also be written from inside a Cloudflare Worker. Cloudflare takes care of synchronizing keys and values across our entire network.
So, what can you do with Workers KV?
Kenton Varda has taken the entire Cap’n Proto web site and implemented it using Workers and Workers KV. The entire site is served from a Worker that accesses static assets stored using Workers KV. That’s pretty neat… his site is entirely ‘serverless’ or rather ‘originless’
Others might not want to go quite so far but here are some use cases:
Shopping cart storage: an e-commerce site can store a user’s entire shopping cart in a key/value pair using Workers KV. The e-commerce back end software can call the Cloudflare API to save (and retrieve, if necessary) the shopping cart contents. A Worker running on Cloudflare’s network can access the shopping cart from Workers KV and use it to show the visitor its contents and get them to checkout.
Storing the cart using Workers KV is more stable than trying to store it in the browser’s local storage (and can’t be cleared by the user) and completely offloads the storage from the e-commerce site’s engine.
A/B testing: a site can make a per-visitor decision about which profile to use for A/B testing and then store that profile information using Workers KV. This has significant speed advantages (since the A/B decision isn’t delayed until the page load in the browser).
Authentication verification can be handled in a Worker. For example, a user can log into a web site and details of the login (including the associated authentication token) can be pushed into a Worker KV value and then checked within a Worker. This offloads checking tokens from the backend server meaning that unauthenticated requests can be rejected quickly and authenticated requests passed through.
In fact, the entire authentication flow can be done in a Worker. A Worker can check credentials against an authentication service via an async API call, and then update a Worker KV value that gets replicated globally automatically.
Page construction can be performed in a Worker. A Worker can retrieve a template from a backend server or the cache and then fill in values on the page from Worker KV values. Those values can be updated rapidly by backend applications. This eliminates the need for solutions like Edge-Side Includes.
With the addition of Workers KV, Cloudflare Workers moves closer to being a complete compute platform embedded inside the Internet.
The Third Place for Code
Until the advent of services like Cloudflare Workers there were really two places to run code: on a server (perhaps rented from a cloud provider) and on the end-user’s client (which could be an IoT device, mobile phone or computer).
Servers and clients have very different properties. Server software is quick to update, if you want to change functionality it can be done quickly because you control when server software is modified. On the other hand client software can be slow to update: it might require a firmware flash on an IoT device, a download on a mobile app or reinstallation on a computer. Even web-based software can be slow to update if the user keeps a session open for a long time.
Client software does have a massive latency advantage over server software: anything the user does with the client will be fast because because the latency between the eyeball and CPU is very low. Server’s tend to have large latency to the client, and latency that can vary widely from moment to moment and location to location.
Cloudflare Workers and similar services take the best of server software and client software. Workers are fast to update (10s of seconds for a global change), yet close to the end user (hence low latency). This gives developers a new way to think about building software: what should go on the server, what on the client and what in the Internet?
The rise of Functions-as-a-Service (FaaS) has caused some confusion because anyone who has spent time with the lambda-calculus or purely functional languages will know that keeping state around turns out to be very, very useful. Truly writing code in a functional style is a big commitment and FaaS services really require state storage to fit in with the majority, imperative style of programming.
Of course, FaaS software can store state in external services and use API calls to get and update it. But that’s likely slow and error prone. Cloudflare Workers KV introduces FaaS-native storage in the form of a key-value store. Workers KV can be written and read externally via the Cloudflare API, or internally directly from a Cloudflare Worker.
We can’t wait to see what people build with Cloudflare Workers and Cloudflare Workers KV. Get hacking! If you want to get started today sign up here.
P.S. — And if you’d like to replicate Babbage’s log table (without the need for a bell and an assistant) here’s how I built a Worker that retrieves the base-10 logarithm of an integer from a Worker KV. If the logarithm was missing it updates the Worker KV store with the computed log.
Step 1: Create a Namespace
Here CF_ACCT contains the Cloudflare account ID, CF_EMAIL the email address associated with the Cloudflare account and CF_KEY the API key. This call creates a namespace called logcard and returns its unique ID (which is needed later)
Step 2: Upload Code and Bind a Variable to the Namespace
This curl uploads the script babbage.js containing the Worker and names it logcard. It binds the namespace created above to the variable LOGCARD. That variable can be used within the Worker to access the Workers KV namespace created above.
The code accesses the namespace and retrieves values with const retrieved = await LOGCARD.get(paddedInt). The full code for the Worker can be found here.
I’ve bound this Worker to the route /logcard on the site https://glowbeamtechnologies.com/. The Worker takes a value between 1 and 9999 in the URI parameter int and outputs a representation of one of Babbage’s punched cards containing the decimal part of the base-10 log of int.
Earlier this year, Google Cloud announced support for the Node.js 8 runtime in the App Engine standard environment. Since then, we’ve been excited to see how web developers have been using the platform.
One of the more interesting use cases that was unlocked with the new Node.js runtime for App Engine is the ability to run headless Chrome without having to do any setup or configuration. Previously, developers would need to run xvfb and X11, create a complicated Dockerfile, or configure OS dependencies themselves to get Chrome working on GCP—but not anymore.
Now, we’re pleased to announce that the Google Cloud Functions and Cloud Functions for Firebase Node.js 8 runtimes have also been upgraded with the OS packages required to run headless Chrome. This means that you can now author Cloud Functions that use headless Chrome—and utilize all the features of a web browser in a fully serverless environment. Headless Chrome lets you take advantage of the modern web platform features from Chromium and the Blink rendering engine too.
We have seen developers using headless Chrome for a variety of use cases:
Hexbyte Hacker News Computers Address Verification allows you to be sure you are securely communicating with the right person, while PGP support adds encrypted email interoperability.
Starting with the latest release of ProtonMail on web (v3.14), iOS and Android (v1.9), and the latest versions of the ProtonMail IMAP/SMTP Bridge, ProtonMail now supports Address Verification, along with full PGP interoperability and support. In this article, we’ll discuss these two new features in detail, and how they can dramatically improve email security and privacy.
When ProtonMail first launched in 2014, our goal was to make email encryption ubiquitous by making it easy enough for anybody to use. This is no easy feat, and that’s probably why it had never been done before. Our guiding philosophy is that the most secure systems in the world don’t actually benefit society if nobody can use them, and because of this, we made a number of design decisions for the sake of better usability.
One of these decisions was to make encryption key management automatic and invisible to the user. While this made it possible for millions of people around the world to start using encrypted email without any understanding of what an encryption key is, the resulting architecture required a certain level of trust in ProtonMail.
While a certain level of trust is always necessary when you use online services, our goal is to minimize the amount of trust required so that a compromise of ProtonMail doesn’t lead to a compromise of user communications. This is the philosophy behind our use of end-to-end encryption and zero-access encryption, and it is also the philosophy behind Address Verification.
Prior to the introduction of Address Verification, if ProtonMail was compromised, it would be possible to compromise user communications by sending to the user a fake public encryption key. This could cause email communications to be encrypted in a way that an attacker, holding the corresponding fake private key, could intercept and decrypt the messages (this is also known as a Man-in-the Middle attack, or MITM), despite the fact that the encryption takes place client side.
Address Verification provides an elegant solution to this problem. We consider this to be an advanced security feature and probably not necessary for the casual user, but as there are journalists and activists using ProtonMail for highly sensitive communications, we have made adding Address Verification a priority.
How Address Verification works
Address Verification works by leveraging the Encrypted Contacts feature that we released previously. Starting with the latest version of ProtonMail, when you receive a message from a ProtonMail contact, you now have the option (in the ProtonMail web app) to Trust Public Keys for this contact. Doing so saves the public key for this contact into the encrypted contacts, and as contacts data is not only encrypted, but also digitally signed, it is not possible to tamper with the public encryption key once it has been trusted.
This means that when sending emails to this contact, it is no longer possible for a malicious third party (even ProtonMail) to trick you into using a malicious public key that is different from the one you have trusted. This allows for a much higher level of security between two parties than is possible with any other encrypted email service. You can learn more about using Address Verification in our knowledge base article.
However, for the many out there who still use PGP, the launch of full PGP support will make your life a lot easier. First, any ProtonMail user can now send PGP encrypted emails to non-ProtonMail users by importing the PGP public keys of those contacts. Second, it is also possible to receive PGP email at your ProtonMail account from any other PGP user in the world. You can now export your public key and share it with them.
Therefore, your ProtonMail account can in fact fully replace your existing PGP client. Instead of sharing your existing PGP public key, you can now share the PGP public key associated with your ProtonMail account and receive PGP encrypted emails directly in your ProtonMail account.
If you are an existing PGP user and you would like to keep your existing custom email address (e.g. email@example.com), we’ve got you covered there, too. It is possible to move your email hosting to ProtonMail and import your existing PGP keys for your address, so you don’t need to share new keys and a new email address with your contacts.
If you are using PGP for sensitive purposes, this might actually be preferable to continuing to use your existing PGP client. For one, PGP is fully integrated into ProtonMail, encryption/decryption is fully automated, and the new Address Verification feature is used to protect you against MITM attacks. More importantly though, ProtonMail is not susceptible to the eFail class of vulnerabilities, which have impacted many PGP clients, and our PGP implementations are being actively maintained.
Finally, we are formally launching a public key server to make key discovery easier than ever. If your contact is already using ProtonMail, then key discovery is automatic (and you can use Address Verification to make it even more secure if you want). But if a non-ProtonMail user (like a PGP user) wants to email you securely at your ProtonMail account, they need a way to discover your public encryption key. If they don’t get it from your public profile or website, they are generally out of luck.
Our public key server solves this problem by providing a centralized place to look up the public key of any ProtonMail address (and non-ProtonMail addresses hosted at ProtonMail).
Our public key server can be found at hkps://api.protonmail.ch (!! This link is used for HKP requests and cannot be accessed with a browser. However, if you want to download the public key of a ProtonMail users, simply replace the “firstname.lastname@example.org” with the address you’re looking for and copy/paste the following link into your browser: https://email@example.com)
Concluding thoughts on open standards and federation
Today, ProtonMail is the world’s most widely used email encryption system, and for most of our users the addition of Address Verification and PGP support will not change how you use ProtonMail. In particular, setting up PGP (generating encryption keys, sharing them, and getting your contacts to do the same) is simply too complicated, and it is far easier for most people to simply create a ProtonMail account and benefit from end-to-end encryption and zero-access encryption without worrying about details like key management.
Still, launching PGP support is important to us. The beauty of email is that it is federated, meaning that anybody can implement it. It is not controlled by any single entity, it is not centralized, and there is not a single point of failure. While this does constrain email in many ways, it has also made email the most widespread and most successful communication system ever devised.
PGP, because it is built on top of email, is therefore also a federated encryption system. Unlike other encrypted communications systems, such as Signal or Telegram, PGP doesn’t belong to anybody, there is no single central server, and you aren’t forced to use one service over another. We believe encrypted communications should be open and not a walled garden. ProtonMail is now interoperable with practically ANY other past, present, or future email system that supports the OpenPGP standard, and our implementation of this standard is also itself open source.
We still have a long way to go before we can make privacy accessible to everyone, and in the coming months and years we will be releasing many more features and products to make this possible. If you would like to support our mission, you can always donate or upgrade to a paid plan.