-4.3 C
New York
Sunday, February 5, 2023

Ballerina: A Data-Oriented Programming Language – InfoQ.com

Live Webinar and Q&A: Streaming SQL on Apache Kafka for Real-Time Processing (Live Webinar May 26th, 2022) Register Now
Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Avdi Grimm describes the future of development, which is already here. Get a tour of a devcontainer, and contrast it with a deployment container.
The panelists reflect on various microservices topics.
Monte Zweben proposes a whole new approach to MLOps that allows to scale models without increasing latency by merging a database, a feature store, and machine learning.
In this article we will be sharing our experience learned from 12 months of adopting certain management and organisational insights from the book Team Topologies. It explores how we identified areas of responsibility and assigned those into mostly customer facing domains which could be given to our teams. It shows how an inverse Conway manoeuvre can be used to improve the architecture.
The panelists discuss the security for the software supply chain and software security risk measurement.
Uncover emerging trends and practices from software leaders. Attend online on May 10-20, 2022.
Learn how cloud architectures achieve cost savings, improve reliability & deliver value. Register Now.
Understand the emerging software trends you should pay attention to. Attend in-person on Oct 24-28, 2022.
InfoQ Homepage Articles Ballerina: A Data-Oriented Programming Language
May 10, 2022 18 min read
Yehonathan Sharvit
reviewed by
Daniel Bryant
In the information systems I have built over the past decade, data is exchanged between programs like frontend applications, backend servers, and service workers. Those programs use exchange formats, like JSON, to communicate over the wire.
Over the years, I have noticed that a  program’s complexity  did not only depend on the complexity of the business requirements but also on the approach I took to represent data inside my programs. 
In statically-typed languages (like Java, C#, Go, OCaml, or Haskell), it seems natural to represent data with custom types or classes, while in dynamically-typed languages (like JavaScript, Ruby, Python, or Clojure), we usually use generic data structures, like maps and arrays.
Join us at Microsoft JDConf 2022, a virtual Java conference for developers (May 4/5) – Save Your Seat.
Each approach has its benefits and costs. When we represent data with static types, we get great support from our IDE and safety from our type system, but it makes the code more verbose and the data model rigid.
On the other hand, in dynamically-typed languages, we represent data with flexible maps. It allows us to quickly create small to middle-sized code without any type of ceremony, but we are operating in the wild. Our IDE doesn’t help us to autocomplete field names, and when we mistype field names, we get runtime errors. 
Until I discovered Ballerina, I thought that this trade-off was an inherent part of programming that we were forced to live with. But I was wrong: it’s possible to combine the best of both worlds. It’s possible to move fast without compromising on safety and clarity. It’s possible to benefit from a flexible type system.

I cannot afford to walk, it's too slow.
I am scared to run, it's too risky.
I want to flow with ease and confidence. Like a ballerina.
When we write a program that manipulates data, it’s preferable to treat data as a first-class citizen. One of the privileges of first-class citizens is that they can be created without extra ceremony, just like numbers and strings. 
Unfortunately, in statically-typed languages, data doesn’t usually have the privilege of being created without ceremony. You need to use a named constructor to create data. When data is not nested, the absence of data literals is not too cumbersome, for example, when creating a library member named Kelly Kapowski that is 17 years old.
But with nested data, the usage of a named constructor becomes verbose. Here is what data creation looks like when we include the list of books that Kelly currently holds, assuming a simplistic library data model, where a book has only a title and an author.
In dynamically-typed languages, like JavaScript, the usage of data literals makes it much more natural to create nested data.
The problem with the dynamically-typed languages' approach to data is that data is untamed. The only thing that you know about your data is that it’s a nested map. As a result, you need to rely on documentation to know what kind of data you have in hand. 
The first thing I appreciated in Ballerina is that it gave me the ability to create my custom types while keeping the convenience of creating data via data literals. 
In Ballerina, like in a statically-typed language, we create our custom record types to represent our data model.  Here is how we create Author, Book, and Member record types:
And in Ballerina, like in dynamically-typed languages, we create data with data literals. 
Of course, like in a traditional statically-typed language, the type system lets us know when we have missed a field in a record. Our code won’t compile, and the compiler will tell us exactly why.
In VSCode, when the Ballerina extension is installed, you get notified about the missing field as you type.
Now, you’re probably asking yourself whether Ballerina’s type system is static or dynamic. Let’s take a look.
In a data-oriented program, enriching data with calculated fields is quite common. For example, suppose I want to enrich a piece of author data with a field called fullName that holds the author's full name. 
In a traditional statically-typed language, I’d need to create a new type for this enriched piece of data, maybe a new type called EnrichedAuthor. In Ballerina, that’s not required; the type system allows you to add record fields on the fly, using the bracket notation, like in a dynamically-typed language. For example, here is how we add a fullName field to an Author record:

I find this capability quite amazing. In a sense, Ballerina allows us — the developers — to have our cake and eat it too, by elegantly introducing a semantic difference between two different notations:
When we use the dot notation to access or modify a record field, Ballerina gives us the same safety and help we are used to in statically-typed languages.
When we use the bracket notation to access or modify a record field, Ballerina gives us the same flexibility we benefit from in dynamically-typed languages.
In some cases, we want to be stricter and disallow the addition of fields completely. No problem: Ballerina supports closed records. The syntax of closed records is similar to the syntax of open records, except that the field list is enclosed within two | characters. 
The type system doesn’t let you add a field to a closed record.
Ballerina also supports optional fields in records via the question mark sign. In the following record, the author’s first name is optional.
When you access an optional field in a record, you need to make sure you properly handle the case where the field is not present. In traditional dynamically-typed languages, the absence of a static type checker makes it too easy to forget to handle that case. Tony Hoare introduced Null-references in 1965 in a programming language called ALGOL, and he later considered it  a billion-dollar mistake
In Ballerina, the type system is there for you. Suppose you want to write a function that uppercases an author’s first name.
This code won’t compile: the type system (and the Ballerina VSCode Extension) will remind you that there is no guarantee that the optional field is there.
So how do we fix our code to handle the absence of the optional field properly? It’s quite simple; after you access the optional field you check if it’s there or not. In Ballerina, the absence of a field is represented by (). 
Note that no type casting is needed. The type system is smart enough to understand that the variable firstName is  guaranteed to be a string after we have checked that firstName is not ().
Another aspect of the Ballerina type system that I find very useful, in the context of data-oriented programming, is that record types are only defined via the structure of their fields. Let me clarify.
When we write a program that manipulates data, most of our codebase is made of functions that receive data and return data. Each function has requirements about the shape of the data it receives. 
In statically-typed languages, those requirements are expressed as types or classes. By looking at a function signature, you know exactly what the data shape of the function arguments is. The problem is that it sometimes creates a tight coupling between the code and the data. 
Let me give you an example. Suppose you want to write a function that returns the full name of an author, you would probably write something like this:
The limitation of this function is that it only works with records of type Author. I find it a bit disappointing that it doesn’t work with Member records. After all, a Member record also has firstName and  lastName string fields. 
Side Note: Some statically-typed languages allow you to overcome this limitation by creating data interfaces.
Dynamically-typed languages are much more flexible. In JavaScript, for instance, you’ll implement the function like this:
The function argument is named author, but in fact, it works with any piece of data that has firstName and  lastName string fields. The problem is that when you pass a piece of data that doesn’t have one of these fields, you get a run-time exception. Moreover, the expected data shape of the function arguments is not expressed in the code. So, to know what kind of data the function expects, we have to either rely on documentation (which is not always up to date) or investigate the code of the function.  
Ballerina’s flexible type system allows you to specify the shape of your function arguments, without compromising flexibility. You can create a new record type, which only mentions the record fields the function needs in order to work properly.

Ballerina’s flexible type system allows you to specify the shape of your function arguments, without compromising flexibility. You can create a new record type, which only mentions the record fields the function needs in order to work properly.
PRO TIP: You can use an anonymous record type to specify the shape of your function arguments.
You are free to call your function with any record that has the required fields, whether it’s a Member or an Author, or any other record that has the two string fields that the function expects. 
Here is an analogy that I find useful to illustrate Ballerina’s approach to types: Types are like eyeglasses that we use in our programs to look at reality. But we need to remember that what we see through our lenses is only an aspect of reality. It is not the reality itself. Like the idiom says: the map is not the territory.
For instance, it is not accurate to say that the function fullName — defined above — receives is a Named record. It is more accurate to say that the function fullName decides to look at is the data it receives through the lenses of a Named record.
Let's look at another example. In Ballerina, two records of different types that have the exact same field values are considered equal.
At first, this behavior surprised me. How could two records of different types be considered equal? But when I thought about the eyeglasses analogy, it made sense to me:
The two types are two different lenses that are looking at the same reality. In our programs, what matters the most is the reality, not the lenses. Sometimes, traditional statically-typed languages seem to put more emphasis on the lenses than on reality.
So far, we have seen how Ballerina leverages types so that they are not in our way, but rather assist us on our way to make our development workflow more effective. Ballerina goes one step further and allows us to manipulate data in a powerful and convenient way via an expressive query language.
As an adept of functional programming, my “bread and butter” commands when I need to manipulate data are made of high-order functions like map, filter, and reduce. Ballerina supports functional programming, but the idiomatic way to deal with data manipulation in Ballerina is via its expressive query language, which allows us to express business logic with eloquence.
Suppose we have a collection of records, and we only want to keep the records that satisfy a certain condition and enrich those records with a calculated field. For instance, let’s say we only want to keep books whose title contains the word “Volleyball”, and enrich them with the author's full name. 
Here is the function that enriches the Author record inside a book.
We could use map and filter to enrich our book collection, using map, filter and a couple of anonymous functions.
But it’s quite verbose and a bit annoying to declare the types of the two anonymous functions. Using Ballerina query language, the code is more compact and easier to read.
Ballerina query language will be covered in greater detail in our Ballerina series.
Before we move forward and talk about JSON, let’s write a little unit test for our function. In Ballerina, records are considered equal when they have the same fields and values. So, it makes it straightforward to compare the data a function returns with the data we expect.
PRO TIP: Ballerina comes with an out of the box unit test framework.
Now that we have seen the flexibility and ease that Ballerina provides around data representation and data manipulation inside a program, let’s see how Ballerina allows us to exchange data with other programs.
JSON is probably the most popular format for data exchange. Quite often, programs involved in information systems communicate by sending each other JSON strings. When a program needs to send data over the wire, it serializes a data structure into a JSON string. And when a program receives a JSON string, it needs to parse it to convert it to a data structure.
Ballerina, being a language designed for the cloud era, supports JSON serialization and JSON parsing out of the box. Any record can be serialized into a JSON string, as seen here:
Oppositely, a JSON string can be parsed into a record. Here, we need to be careful and make sure we handle cases where the JSON string is either not a valid JSON string or doesn’t conform to the data shape you expect. 
PRO TIP: Ballerina embraces errors and allows us to succinctly write the same logic in a more compact way via a special check construct.
Side Note: JSON support in Ballerina goes far beyond serialization and parsing. In fact, Ballerina comes with a json type that allows you to manipulate data exactly like in a dynamic language. Advanced JSON in Ballerina will be covered later in our Ballerina series.
We have explored the benefits Ballerina provides around data representation, data manipulation, and data communication. We are going to conclude our exploration with an example of a mini data-oriented program that illustrates those benefits.
Imagine we’re building a Library Management System made of multiple programs that exchange data about members, books, and authors. One of the programs is required to process member data, by enriching it with calculated fields of the full name of the member, only keeping books whose titles contain “Volleyball” and adding the author’s full name to each book.  

The program communicates over the wire using JSON: it receives the member data in JSON format and is expected to return it in JSON format.
Here is how the code for this program would look in Ballerina.
First, we create our custom record types. 
Then, a small utility function that calculates the full name of any record that has firstName and lastName string fields. We express this constraint using an anonymous record.
We use Ballerina query language to filter and enrich books:
Now, we write our business logic: a function that enriches a Member record with:
Finally, we write the program entry point that does the following:
Note that we have to deal with the JSON string we receive being invalid. This is how it’s done:
That’s it for the code that deals with the logic itself. You can find the complete code on GitHub.
In order to make it into a real application, I would use one of the many many protocols that Ballerina provides out of the box for communicating over the wire, like HTTP, GraphQL, Kafka, gRPC, WebSockets, and more.
While working on the code snippets that are presented in this article, I had the impression that I was re-experiencing the pleasant sensation that my IDE used to bring me when I was working on statically typed languages. I was surprised to discover that to enjoy this experience, this time I didn’t have to compromise on the power of expression and the flexibility I’d gotten addicted to since starting to work with dynamically-typed languages. 
The main thing that I’m missing in Ballerina is the ability to update a piece of data without mutating it, as I am used to in functional programming. I was not able to implement this capability as a custom function in Ballerina, as it requires support for handling generic types. But I do hope that in the near future this capability will be added to the language.

I see Ballerina as a general-purpose programming language, whose approach to data makes it a great fit for building information systems. In my opinion, this is due to Ballerina’s key values around data representation, data manipulation, and data communication.
You can learn more about Ballerina by visiting ballerina.io.
In the upcoming articles of our Ballerina series, we will cover additional aspects of Ballerina, like tables, advanced queries, error handling, maps, json type, connectors, and more… You can register to our newsletter to get notified when the next article in the Ballerina series is published.

Becoming an editor for InfoQ was one of the best decisions of my career. It has challenged me and helped me grow in so many ways. We’d love to have more people join our team.

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

We protect your privacy.
You need to Register an InfoQ account or or login to post comments. But there’s so much more behind being registered.
Get the most out of the InfoQ experience.
Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

We protect your privacy.
Real-world technical talks. No product pitches.
Practical ideas to inspire you and your team.
QCon Plus – May 10-20, Online.

QCon Plus brings together the world’s most innovative senior software engineers across multiple domains to share their real-world implementation of emerging trends and practices.
Find practical inspiration (not product pitches) from software leaders deep in the trenches creating software, scaling architectures and fine-tuning their technical leadership to help you make the right decisions.
InfoQ.com and all content copyright © 2006-2022 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we’ve ever worked with.
Privacy Notice, Terms And Conditions, Cookie Policy


Related Articles


Please enter your comment!
Please enter your name here

Latest Articles