Lucid Operations is dedicated to bringing clarity to such questions.

Email me if you'd like me to help bring clarity to your organization around payments, operations, or data management. Go to the about page for my statement of purpose and my background.

Recent Posts


Four Levels of Clarity

Introduction

This is a talk I’ve had several times and outlines my way of categorizing an organization’s level of clarity around their status, goals, and progress. I use a sequence of light metaphors: no light, flashlight, headlight, and searchlight. My goal as a consultant is to bring your organization to the next level of clarity.

No Light

The first stage is no light. This is the worst position an organization can be in and, fortunately, is pretty uncommon. Imagine you are trying to get somewhere in the woods on a cloudy night but you have light. In this stage of clarity, you have no awareness of your surroundings and when you trip and fall, you don’t really know why, you just pick yourself up and continue to feel your way around.

When you’re in this situation, your highest priority should be gaining visibility into what’s going on. Make diagrams, build monitoring and alerting, track projects and progress. Getting visibility into your current situation and understanding what systems you have currently deployed, how they are functioning, and what errors they are producing gets you to the next level of clarity, flashlight.

Flashlight

Once you have a flashlight in the dark you are much better off than before, you can see the tree before you run into it, and (if you are doing a good job of scanning) you’ll see the roots before they trip you. Many tech organizations I’ve worked with fall into this category. For this you need:

  • a relatively clear picture of your current deployments and systems
  • monitoring and alerting around production functionality
  • up-to-date documentation on development and support systems

If you’re missing any of these pieces, then I wouldn’t really consider you at complete flashlight level of clarity yet; that is, there are definitely gaps where things could jump out at you and you’d have to feel around to fix the problem.

When you’re in this situation, you have good visibility into what’s happening right now and can respond to issues as they come up. But at this point you are only responding to events as they occur. You don’t have visibility into longer term issues and challenges. To get to the next level of clarity, you’ll need to understand your trajectory and make plans. Once you have action plans and are tracking progress you will have entered the next level of clarity, headlight.

Headlight mode

Once you have short term plans and are tracking progress, you are at the headlight stage. I would say that the vast majority of my clients are fall somewhere around here, and during certain phases of your growth, this is exactly where you want to be. You are making progress, becoming aware of challenges and obstacles before they actually hit you, and are able to plan ahead for capacity (both meat-space and digital-space). For this you need (in addition to all of the above):

  • a clear picture of current development and production capacity and projections for the immediate future,
  • proactive monitoring and alerting around production functionality which warns you of situations before they arise (alerts which trigger as a system is approaching failure, not after it fails),
  • minimal manual steps during your development and produciton lifecycle, enabling you to change direction (or go back) with few manual steps.

Although the majority of my clients fall somewhere around here, most of them didn’t meet the full complement of bullets above. I do think that you should be striving to meet all of the bullets above, doing so well enable you to make progress (and make projections around that progress) and not be surprised (too often) by issues which suddenly appear. For key areas of your business, once you do have all the above, you are may need to reach for the next level of clarity.

Searchlight

To be honest, I have never worked in or with an organization which was entirely at this level of clarity. Some teams, in particular the data group and the data center group, within Facebook certainly were here. Mostly it’s an aspirational goal of being not only aware of your short term challenges and issues before they affect your deliveries, it also includes having a clear picture of your realistic 1-2 years objectives. Think of it as being on a road with a map and a spotter with a searchlight above and ahead of you. To be here you need:

  • a picture of your long term objectives and goals and be tracking your progress towards them,
  • a good idea of the knowns and unknowns on your path towards those goals,
  • active plans and tracking towards clarifying that path.

Most organizations probably don’t need to be at this level of clarity; because the business and technical environment can change so rapidly, attempting to be here all the time across all functionality is a waste of time and resources. However, for those pieces of your business upon which you absolutely depend or which stand at a key fulcrum which can lever your business from profitable to unprofitable (I would say that the two groups I called out for Facebook definitely fall into this category), you want to be at this level of clarity. To get here, you need to invest, not only in day-to-day and month-to-month planning and tracking, you need to invest in

  • potential big win projects which may not pay-off for several months or even years,
  • have a map of your objectives and your progress towards them.
  • be continuously updating and evaluating all of the above.

Conclusion

As I mentioned in the introduction, I’ve given this talk several times to prospective clients, underlining my goal of bringing the organization to the next level of enlightenment and I’ve had the good fortune to work with several organizations where we got to the next level. In particular I specialize in:

  • Payment systems,
  • complex computing challenges,
  • operations and systems architecture.

If your company/organization/engineers are working to achieve the next level of clarity, I can help with understanding and focusing your organization and achieving the next level of lucidity.

Send your thoughts, feedback, interest by emailing me at dlee@lucid-operations.com or

Why Are Payment Systems Hard?

Introduction

This post is about how payment systems are different from most software systems engineers and architects will encounter. In most systems, the challenges are in getting the right API’s and documentation, scalability, availability, etc. With payment systems, however, the problems multiply and what seems at first to be a straight forward solution (“We just store the orders and keep them updated, right?”) can be quickly overwhelmed by corner cases, error recovery, and reporting requirements. This post will go over some of the dimensions often overlooked by first-time payment system engineers.

The five dimensions along which payment systems differ from pretty much every other computer system are:

  • Customer Expectations
  • Payment System Complexity
  • Regulatory Requirements
  • Accounting and Reporting Requirements
  • Foreign Exchange Requirements

This is not exhaustive, but cover the high points. Each of these sections could be expanded into a separate article which is something I may do in the future.

I revised this article to add in a couple of sections on taxation and PCI compliance.

Customer Expectations

“My post is gone!” “My app just died!” “My picture got posted twice!” such complaints are myriad and plague most of the software that is out there. The total amount of angst and anger directed at developers over such bugs is large. But the magnitude and the intensity of such anger are orders of magnitude less then that created when payments are lost, someone is double charged, refunds are late etc.

The irritation of someone who has lost their text or a click is but a droplet in the raging maelstrom of rage of someone who has been charged twice for an item he/she has not (or cannot) receive. Consumer’s tolerance for errors and mistakes in payment systems should be considered close to nil. As such, the time needed to respond is less, the accuracy and correctness of code needs to be much higher, and the penalties for failure is much higher still.

It is an order of magnitude harder to get payment systems to work as well as other systems. Yet they need to work better because of the expectations of all parties involved. When things go well it’s “you’re just doing your job”. And when things go badly (as they invariably do, even in the best of systems), the amount of shit dumped on the developers is an order of magnitude higher. That’s the reason that more than 90% of the developers I know who have worked in payments never go back. The risk/reward just isn’t there: you get paid about the same for work that is harder, appreciation for success is about the same, and the penalty for failure is higher.

Payment System Complexity

Most API’s used throughout the industry were created in the last decade or so and seldom exceed a few dozen calls and return conditions. The payments industry API’s have been built up over decades and is many (many) layers deep. And unlike most API’s they do not involve a single party, there are often 4 or 5 legal entities involved in a single Credit Card (CC) AUTH request.

  • Buyer: The person buying the item with the CC
  • Issuing Bank: The bank which issued the CC
  • Network: Visa or MC or other Association
  • Acquiring Bank: The bank which set up the merchant
  • Seller: The merchant selling the item

Most non-payment systems involve a single party (or at most two parties) and yet the myriad of failure, recovery, verification, and validation pathways can become overwhelming if not managed properly. Now imagine multiplying the number of entities by 2 or 3. Add to that transactions are initially completed synchronously but stretch over time with potential future actions (chargeback?) for each transaction.

And this system wasn’t built in the last decade. No, this system was built up over the last century. This is why most payment systems have response code listings that number in the hundreds.

Some things often overlooked by new payments developers are:

  • Did you know that you can get a successful but partial auth?
  • Should you/can you split payment methods?
  • What’s the difference between void vs refund vs reversal?
  • What happens in your system with multiple partial refunds?
  • How should you account for chargebacks occurring after settlement? After closing?

Regulatory Requirements

This section will focus on U.S. laws and regulations, but most nations will have a comparable degree of complexity. Here’s a partial list of a few regulations which govern payments for U.S. companies:

Office of Foreign Asset Control (OFAC) is the general term for what is actually a whole host of agencies, laws, regulations, and executive orders which prohibit economic interaction with specific entities and countries. It (mostly) boils down to a list of individuals, organizations, and countries with whom U.S. corporations are prohibited from doing business. Failing to comply with OFAC lists can result in fines, freezing of assets, and significant federal penalties. Never heard of OFAC (as in “Oh Fack, we forgot to restrict buyers from North Korea?!”)? I guarantee that your payment processor has.

The Sarbanes-Oxley Act often reduced to “sox”. Most of the requirements around SOX compliance for programmers centers around record keeping, accountability, and limiting access. The requirements around reporting and accountability put very specific limitations on how records can be altered and the amount of access developers can have over production payment systems. A most basic example is that auditors will frown upon software engineers who write the software having control of production systems.

Generally Accepted Accounting Principles aka GAAP. For developers, this mostly boils down to being able to answer the questions: “where is this money?” and “who owns this money?” This is actually good place to segue into my next section.

PCI DSS compliance. In a rare moment of cooperation, the major card companies got together to form the Payment Card Industry organization which maintains and publishes the Data Security Standards. They specify standards for the storage, transmission, and other handling which need to be followed when dealing with payments data. If you are storing any payments related data, you need to be aware and (at a minimum) will need to do a self assessment of your set up.

Sales and VAT Tax. If you are selling any kinds of goods or services, you need to be aware of your tax collecting and remittance obligations. Also, regulations with regard to how taxes need to be displayed in receipts will vary by location. Also, depending on the goods you are selling, you many need to verify and manage sales and VAT tax exemption numbers.

Reporting, Accounting, and Finance

Accounting and finance are often spoken together but they are actually separate functions. Most of the interaction of programmers with your accounting and finance departments will be through reporting. It would be impossible to delve into the even the most basic overview of the full practice of accounting and finance so I will just touch on some of the complexities in reporting which are often overlooked by newly minted payments engineers.

Let’s say that your company W sells widgets. You have a website that let’s people select the number of widgets to buy and they (virtually) swipe a CC to buy said widgets which you then ship from a warehouse somewhere. Here is the basic sequence of events around a single purchase and partial return (note to experts out there, I know that what follows simplifies and glosses over some of the steps):

  1. Buyer creates a cart with four widgets for a total of $20
  2. Buyer enters CC information which is used to authorize $20
  3. Warehouse ships 4 widgets to user
  4. Company W captures the $20 based on the original authorization
  5. Company receives settlement of transaction of +$20 (in reality less any concurrent fees charged by the network, acquiring bank etc.)
  6. Buyer decides he only needs two widgets and requests an RMA (return merchandise authorization)
  7. Company issues an RMA to Buyer
  8. Buyer ships two widgets back to Company
  9. Company W receives the two widgets
  10. Company issues a refund of $10 to the Buyer (hopefully referencing the original transaction)
  11. $10 from Company’s account are moved out.
  12. $10 appear in Buyer’s account.

Now at each step of that process after the first, there is a recorded CC transaction and monies which exist in some bank account somewhere. And at each step of the way, the location and/or owner of those monies changes. Until the warehouse ships, the $20 is still in the buyer’s bank account, but has been reserved by the authorization request and cannot be spent. However, as the item has still not shipped, the legal owner of the money is still the buyer. After the company ships the items and issues the capture request, the funds are not in the buyer’s account, but have not yet appeared in the company’s account. Yet, the $20 must be registered as belonging to the company (and entered as revenue for that day/week/month/quarter). Once the buyer requests an RMA, however, $10 of that revenue needs to be unrecognized from that same time period (ever wonder why financial statements for months/quarters don’t close the same day the quarters end?). And, of course, the money doesn’t instantly appear in the buyer’s account but there is another round of money movement in the real world.

Now multiply those states and steps for every transaction you do. Then add in a host of off-the-beaten-path issues like: RMA’s issued but items not received, items shipped but not received, partially fulfilled orders, chargebacks 29 days after the original payment etc. etc. and you should be able to see how the question of “How much money did we make last month?” can become a complicated reporting problem.

Foreign Exchange (FX) Fun

Are we having fun yet? Now take the (relatively) straightforward example from above, and add in a different currency where the user is paying with a GBP based CC but is being authorized and captured in USD. When we do the authorization for $20 dollars the buyer is charged around 17 GBP. However due to the variable nature of FX, when we issue the refund for $10 USD, the buyer gets refunded 7 GBP instead of 8.5 like he’s expecting. He naturally calls and complains. Did you track which FX rate was used for which part of the transaction?

Or just as much fun, if you have USD and GBP prices and are settling in USD, under which ledger do you put the excess/shortfall which happens due to changes in FX?

Conclusion

If you made it this far, I salute you! You should consider a career in payments engineering. If your company/organization/engineers are struggling with the above complexity, I can help you get clarity about the workflows and paradigms which, although they can’t eliminate the complexity of payments, can at least make them manageable. Send your thoughts, feedback, interest by emailing me at dlee@lucid-operations.com or

The Stack Beneath This Site

This post is about the tools and services used to build and publish this site plus some brief mentions of alternatives and why they weren’t selected. It’s a soup-to-nuts quickie of all of the services and tools used to create, store, and publish this website. Email me or your comments!

I’ll cover the following layers

  • Domain Registration: How you reserve you domain name. In this case lucid-operations.com.
  • DNS service: This is how you associate your domain name with a specific IP (IP6?) address on the web.
  • Mail: This is how you manage mail to your domain. And yes, some people do still use email.
  • Framework: This is how you convert your content into a viewable website.
  • Hosting: These are the services which serve up that website.
  • Local tools: The programs I use on my local machine to test and build this website.
  • Cost:
  • Future: Some thoughts on changes and directions.

Domain name registration

I chose Gandi for the registry of domain name. I liked that their website wasn’t littered with upsell and they had technically oriented help pages. Obviously there’s a huge plethora of registrars from which to choose. If you’re happy with the price and their systems just choose one.

Mail

A long while back I switched most of my mail to FastMail. Their service is really reasonably priced and you don’t have the Big-G (or is it the Big-A) reading your mail over your shoulder. Their servers are fast and I’ve had a great experience with them so far. I actually have them serving as the primary DNS for lucid-operations.com, but you could just as easily have Gandi do that and direct the MX records to FastMail.

Framework

  • Jekyll: I had a few requirements for this:
    • I didn’t want to have to build out a server (or multiple servers).
    • Most of the content is simply content so I didn’t want a backend database with all the associated issues which happen with that.
    • I wanted to be able to test changes locally.
    • I wanted to be able to write my posts in Markdown format

Some options which I considered are rejected were:

  • WordPress: I wanted control over my content and files. I also didn’t like the idea of having to either host a full blown stack or cede fine-grained control over to some wordpress host.
  • SquareSpace: And similar services Wix, Weebly, etc. have nice WYSIWYG interfaces but ultimately they cost you money and control. They do offer really nice templates and a simple set up. More concretely, I wanted to be able to update my website using a git push and none of these offer this that I know of.
  • Ghost: A couple of companies I’ve seen are using them as their blog homes. They have a nice markdown based editing system and management system. I could see migrating my content to ghost if the Jekyll/GitHub-Pages system and I part ways.

Hosting

I decided to use GitHub Pages because I already had a GitHub account and I liked the idea of a static website over which I had maximal control over the content.

I considered building and deploying an image to AWS with NGINX and running this website as a jekyll generated static website, deployed as a docker image and served by an nginx (docker) instance. But although it’d be interesting, I don’t think it would offer any real advantages over just using GitHub Pages. I’m already storing in GitHub, and I’d just be adding a half-a-dozen or more steps and things which I have to manage, update, and maintain to keep my website up.

That said, if I wanted to run some dynamic web apps, I would probably move these pages down to a sub-domain and put up a master page with links as needed.

Local Tools

These are the tools I use on my local machine, an aged but still remarkably sprightly MacBookPro (Retina).

  • Mac Terminal.app: It works. I use the system wide clipboard and pbpaste/pbcopy widely and wildly.
  • bash: Because it’s universal and does most everything I need.
  • tmux: Because one shell isn’t enough.
  • git: to control and keep track of changes to the site.
  • Jekyll: has a nice built in server you can run after generating the site.
  • MacVim: To edit the files, because I’m both a mac head and an old *nix head. In one head.
  • Solarized: Because I like consistency. In fact, the basic theme for this website started out as a solarized jekyll theme which I desaturated.
  • Safari: this is probably sub-optimal as both Chrome and Firefox have really solid dev tools. But I only needed to be able to dive into basic elements and css definitions and this does the job.

Cost

Aside from the domain name registration, this website doesn’t cost me any money to serve. Even adding in CloudFlare for https support shouldn’t cost me anything.

Future Thoughts

As mentioned above, I could see migrating to Ghost in the future, but although it is open-source and all that, I’m leery about having to take on the whole stack of backend stuff to run my static blog. I like that I can test changes to this blog with one jekyll command.

One of the drawbacks of using github pages directly is that you can’t get https to your custom domain. CloudFlare has a solution for this which I will be testing out next week.