Building a modern file server in Go

Tai Vong
7 min readApr 13, 2023
the Go CDK

It has been a long time since I wrote the last article about how I built the file server for my company. The file server is meant to hide users from knowing what storage engine is and to allow for faster and hidden switches of the engines. The previous version of the file server was kept simple, as it only served for simple use cases such as uploading and downloading the whole file for processing.

Recently, I reworked on it to create a way to adapt with media delivery use cases. Apart from the raw implementation, this involved adopting and manipulating various new technologies to facilitate better file transferring and caching.

On cloud platform

Back in the day, file servers were usually set up on physical machines so that it was easy for us techies to manage them. This was great for backup purposes and manual operation. But now, our Go applications are small and portable, so we can deploy them anywhere using Docker. That means we can host our file server on various cloud platforms like GCP, AWS, and Azure. These platforms provide storage engines with flexible pricing plans that are more affordable and come with loads of new features like transcoding APIs, quick setup of a CDN, and backup and restore capabilities.

If you’re looking to build cloud-native applications, the Go Cloud Development Kit (CDK) can be an excellent tool. It allows developers to integrate their applications with cloud platforms in a way that isn’t tied to any specific platform or vendor. This means you can build your application once and deploy it anywhere, without worrying about any compatibility issues. We recently used Go CDK to work with cloud storage engines, and the experience was seamless.

For more details, check out this project: https://gocloud.dev/

Connecting

With the Go CDK, switching storage engines has become an easy task with a handy interface. In fact, creating a new bucket on any cloud provider or even on a local environment requires just a small piece of code.

import (
"gocloud.dev/blob"
_ "gocloud.dev/blob/<driver>"
)
...
bucket, err := blob.OpenBucket(context.Background(), "<driver-url>")
if err != nil {
return fmt.Errorf("could not open bucket: %v", err)
}
defer bucket.Close()
// bucket is a *blob.Bucket; see usage below
...

The code is so generic that when switching platforms, we only need to replace the old bucket path and drive with the new ones. Once we have established a connection to the bucket, we can easily implement many additional features.

Uploading

The uploading process is quite pretty straightforward. Firstly, we create an index for the file, which includes important details such as the file name, size, and format. This allows us to easily manage and access the file later on.

Next, we retrieve the file data from the multipart form of the upload request. After that, we create a new file object and copy the file data to the object stream. This stream is then used to upload the file to the cloud storage. Finally, we close the stream to finalize the upload process.

// Do some storing file information system file indexer for later access

// Get file reader from multipart request
formFile, header, err := req.FormFile("my_file")
if err != nil {
return err
}
defer formFile.Close()

// Open new writer for writing the new content
objectWriter, err := bucket.NewWriter(ctx, fileName, nil)
if err != nil {
return err
}
defer objectWriter.Close()

// Draining the file
_, err = io.Copy(objectWriter, file)
if err != nil {
return err
}

Downloading

Downloads play a crucial role in delivering content to users. To enhance the caching and delivery strategy, modern conditional HTTP headers have been developed. These headers are designed to optimize caching as much as possible, and modern applications can also take advantage of them. This caching mechanism is called heuristic caching. If no Cache-Control header is given, the response will be stored and reused if certain conditions are met. Let’s take a look at some examples of these conditions.

Last-Modified

The HTTP header Last-Modified tells the client when a resource was last modified. This information is used to check whether the resource has been updated since it was last requested. When the client makes a request, it sends an If-Modified-Since or If-Unmodified-Since header that contains the date and time of the last request. If the resource has not been modified since then, the server will respond with a 304 (Not Modified) status code and the client will use the cached version. This helps to reduce bandwidth and improve overall performance.

Last-Modified: <day-name>, <day> <month> <year> <hour>:<minute>:<second> GMT

The headers If-Modified-Since and If-Unmodified-Since allow for conditional requests to be made to a server. If-Modified-Since checks if the file was modified since the provided date and returns a response with code 200 (OK) and the file only if it was modified, otherwise it returns a response with code 304 (Not Modified) and no file.

This is commonly used for local caching. If-Unmodified-Since works similarly but also checks for updates made to the resource. It only resends or accepts modifications if the resource has not been updated since the date specified in the header. If the resource has been updated, the response will be a 412 (Precondition Failed).

ETag

The ETag header provides a way for web servers to make caching more efficient and reduce bandwidth usage. It is similar to the Last-Modified header in that it allows the server to determine if a resource has been modified, but it is more accurate since it includes a unique identifier for the resource. This identifier is generated by the server and is based on the content of the resource. In addition to improving caching, ETags can also help avoid mid-air collisions when multiple requests try to update the same resource simultaneously.

ETag: W/"<etag_value>"
ETag: "<etag_value>"

The If-Match header is used by the server to check if the requested resource matches any of the ETag values listed in the header. If there is a match, the server will return the requested resource for GET or HEAD, or process the update for non-safe methods. If there is no match, a 412 (Precondition Failed) response is returned.

On the other hand, If-None-Match works in the opposite way. It checks if there are no resources that match the ETag values. If there are no matches, new content will be returned or the update process will be triggered. If there is a match, a 304 (Not Modified) response is returned for fetching resources, and 412 (Precondition Failed) is used for the rest.

Range

The Range header is used to request only a part of a document, and the server may respond with the requested ranges in a multipart response. If the server can’t handle the Range header, it will return a 200 (OK) status code. If the requested ranges are valid, the server will respond with a 206 (Partial Content) status code. If the requested ranges are invalid, the server will respond with a 406 (Range Not Satisfiable) error.

The If-Range header extends the range request with conditionals. If the time or ETag requested in the If-Range header matches, the server will return the requested ranges. Otherwise, it will send back the full resource with a 200 (OK) status code.

Go Implementation

As modern applications rely on technologies that optimize bandwidth consumption, supporting these headers is crucial. However, ensuring future support can be difficult. Luckily, Go HTTP implementation has a solution with the ServeContent method. This method automatically handles modern headers, making it easier to support them in the future.

ServeContent replies to the request using the content in the provided ReadSeeker. The main benefit of ServeContent over io.Copy is that it handles Range requests properly, sets the MIME type, and handles If-Match, If-Unmodified-Since, If-None-Match, If-Modified-Since, and If-Range requests.

In the past, returning a stream as a ReadSeeker required a lot of effort to process the seeker on the Reader of an object stored in Cloud Storage. However, with the new Go CDK, the RangeReader is supported, making it much easier to process the ReadSeeker directly.

fileName := "foo.txt"

// Retrieve the file attributes
attr, err := s.bucket.Attributes(ctx, fileName)
if err != nil {
return nil, err
}

// Open the key "foo.txt" for reading with the default options.
r, err := bucket.NewReader(ctx, fileName, nil)
if err != nil {
return err
}
defer r.Close()

// Readers also have a limited view of the blob's metadata.
fmt.Println("Content-Type:", r.ContentType())

// Set the header for file response
if attr.CacheControl != "" {
w.Header().Add("Cache-Control", attr.CacheControl)
}
if attr.ContentDisposition != "" {
w.Header().Add("Content-Disposition", attr.ContentDisposition)
}
if attr.ContentEncoding != "" {
w.Header().Add("Content-Encoding", attr.ContentEncoding)
}
if attr.ContentLanguage != "" {
w.Header().Add("Content-Language", attr.ContentLanguage)
}
if attr.ContentType != "" {
w.Header().Add("Content-Type", attr.ContentType)
}
if attr.ETag != "" {
w.Header().Add("Etag", attr.ETag)
}

// Serve the content
http.ServeContent(w, r, fileID, attr.ModTime, r)

The process is easy to follow. Firstly, we retrieve information about the file saved in a cloud storage bucket using Go CDK. Then, we create a reader to read the contents of the file. After that, we set some headers in the response to match the attributes of the file such as content type, encoding, language, and ETag. Lastly, we serve the contents of the file in an HTTP response using the http.ServeContent function.

Conclusion

Creating a file server within a microservices ecosystem provides many benefits such as better security, protection against changes, and seamless integration with existing cloud platforms. The Go team recently announced a new project called Go CDK, which is a collection of tools for integrating with well-known cloud platforms. This project enables developers to easily build file servers. In this article, I showed you how to use the Go CDK and the standard HTTP package of Go to build a modern file server from scratch.

--

--