Notes from the Linked-In: Lessons learned and growth and scalability session at QCon with Jean-Luc Vaillant.
Their architecture includes:
- Java (trying out some Ruby, adding some C++, as little as possible)
- Oracle 10g and MySQL
- ActiveMQ (tried OracleMQ, doesn’t recommend it)
- Tomcat & Jetty
Graph computations don’t perform very well in a relationship database: with large numbers of members, and large numbers of connections, the combinatorics can be staggering. Add to this that simple approaches to storing this information would require extensive joining. Best way to get performance was to run the algorithms on the graph in RAM.
That raises the connection of how to keep the RAM database in sync at all times. One option is to update the database and inform other engines of changes through direct RPC, reliable multicast, JMS. This has the typical problems of two-phase commit.
An alternate approach that LinkedIn has used is to log changes in a transaction log which can be pulled from each graph engine into RAM as necessary. The approach is currently Oracle-specific, but it is applicable to just about any database.
Once that’s in place, the in-memory techniques for traversing the graph are far less painful. Breadth-first traversal to get connections of various degrees. Using symmetry to find connections from both sides.
Having run into issues with Read-Write Lock, he prefers Copy On Write.
LinkedIn is the largest professional networking site in the world. LinkedIn employees presented two sessions about their server architecture at JavaOne 2008. This post contains a summary of these presentations.
Key topics include:
- Up-to-date statistics about the LinkedIn user base and activity level
- The evolution of the LinkedIn architecture, from 2003 to 2008
- “The Cloud”, the specialized server that maintains the LinkedIn network graph
- Their communication architecture