< >

Why Perforce is more scalable than Git

Posted 16 years ago

Okay, say you work at a company that uses Perforce (on Windows). So you're happily tapping away using perforce for years and years. Perforce is pretty fast -- I mean, it has this "nocompress" option that you can tweak and turn on and off depending on where you are, and it generally lets you get your work done. If you change your client spec, it synchronizes only the files it needs to. Wow, that's blows the mind! Perforce is great, why would you ever need anything else? And its way better than CVS.

Suddenly you have to clone something with git, and BAM! The world is changed. You feel it in the water. You feel it in the earth. You smell it in the air. Once you've experienced git, there is no going back, man. Git is the stuff man. You might have checked out firefox -- but have you checked out firefox ooon GIT?

So many really obvious things are missing in p4. Want to restore your source tree to a pristine state? "git clean -fd". Want to store your changes temporarily to work on something else? "git stash". Share some code with a cube-mate without checking in? "git push". Want to automatically detect out of bounds array accesses and add missing semicolons to all your code? "git umm-nice-try"

Branching on git is like opening a new tab in a browser. It's a piece of cake. You can branch for EVERY SINGLE BUGFIX. And you wrote the code, so you get to merge it back in, because you are the expert.

Branching on Perforce is kind of like performing open heart surgery. It should only be done by professionals: experts in the art who really know what they are doing. You have to create a "branch spec" file using a special syntax. If you screw up, the entire company will know and forever deride you as the idiot who deleted "//depot/main". The merging is done by gatekeepers. Hope they know what they're doing!

Now, if you have been using git for a few days you might discover this tool called "git-p4". "AHA!" you might say, "I can import from my company's p4 server into git and work from that, and then submit the changes back when I am done," you might say. But you would be wrong, for a number of reasons.

git-p4 can't handle large repositories

Really. It's just a big python script, and it works by downloading the entire p4 repository into a python object, then writing it into git. If your repo is more than a couple of gigs, you'll be out of memory faster than you can skim reddit.

But that problem's fixable. I was able to hack up git-p4 to do things a file at a time in about an hour. The real problem is:

Git can't handle large repositories

Okay this is subjective because it depends on your definition of large. When I say large, I mean about 6 gigs or so. Because your company's source tree is probably that large. If you have the power, you will use it. Maybe you check in binaries of all your build tools, or maybe for some reason you need to check in the object files of the nightly builds, or something silly like that. P4 can handle this because it runs on a cluster of servers somewhere in the bowels of your company's IT department, administered by an army of drones tending to its every need. It has been developed since 1995 to handle the strain. Google also uses Perforce, and when it started to show its strain, Larry Page personally went to Perforce's headquarters and threatened to direct large amounts of web traffic up their executives' whazzoos until they did something about it.

Git has none of that. The typical git user considers the linux kernel to be a "large project". If you've looked at Linus's git rant on Google code, take a listen to see how he sidesteps the question of scalability.

Don't believe me? Fine. Go ahead and wait a minute after every git command while it scans your entire repo. It's maddening because its long enough to be annoying, but not enough time to skim Geekologie.

The solution

You know what? I don't think many people really use distributed source control. The centralized model is here to stay. Most git users (especially those using Github) use the centralized model anyway.

Ask yourself this: Is it really that important to duplicate the entire history on every single PC? Do you really need to peruse changelist 1 of KDE from an airplane? In most cases, NO. What you really want is the other stuff: easy branching, clean, and stash, and the ability to transfer changes to another client. The distributed stuff isn't really asked for, or needed. It just makes it hard to learn.

Just give me a version control system that lets me do these things and I'll be happy:

Let me merge changes into my coworker's repos, without having to check them in first.
Let me "stash" stuff cause it's really handy. Clean is nice to have too.
Make branching easy.
Don't waste 40% of my disk space with a .git folder, when this could be stored on a central server.

Is that really so hard?

Steve Hanov makes a living working on Rhymebrain.com, rapt.ink, www.websequencediagrams.com, and Zwibbler.com. He lives in Waterloo, Canada.

Post comment

edit

Philippe Lhoste

nine years ago

Sorry as I haven't read every comment on the article...

I happen to use Perforce at work, and Git for personal projects. I don't dislike Perforce, even as it has the drawbacks you describe. I like Git, now that it has good Windows support, particularly for its branching capability, sorely missing in Perforce.

Eg. when you work on a feature, then you have to do a quick fix... touching one of the files you change for the feature.

Perforce also has good visual tools (P4V). The time-lapse view and the revision graph are particularly powerful.

I know Perforce is appreciated in the game industry, precisely for the reason you mention: large repository handling, and particularly large file handling. Game assets (images, maps, sounds, movies, etc.) can take up lot of space!

But Git didn't want to be left behind. So, seven years after your article, Git has a special feature to handle large assets. I haven't tried it, I don't know if it is on par with Perforce, but it here.

On the other hand, Perforce evolved too: they allow to shelve files (equivalent of stash), which also allows to share uncommited code with co-workers.

They also allow to work off-line, in case the server is down (or not reachable)...

Still no easy branching, reserved to Perforce gurus of the company...

And still using this pesky read-only attribute...

edit

Jack

nine years ago

At risk of being flamed here, but to fulfill the four criteria you mentioned at the end, I have implemented exactly such a system back in 2005. Branching was extremely fast - as was workspace creation. The solution used Clearcase MVFS views running on Citrix (to circumvent network latency). We even had it so that within 1 minute of clicking a button, a view was created on the branch required.

edit

chris

ten years ago

The reason I'm reading this article is because I'm bored waiting for perforce to finish whatever it is doing. Git is much faster for everything in my experience - except for the initial clone.

edit

Dr. Gates

ten years ago

ohh man, the comments are unreadable... Masonry pff ;)

edit

David Noble

eleven years ago

Spot on in part but missing thebfact that working offline is the norm these days not the exception!

edit

Leon S.

eleven years ago

If you have a 6gb repo or you're checking in object files, you are pants-on-head retarded anyway. Git follows the philosophy of providing "enough rope to hang yourself". If you do actually hang yourself, you're probably doing the world a favour (ok so the metaphor breaks down here).

edit

Br. Bill

eleven years ago

Good article.

Here is a correction to this article and a list of updates to Perforce that change some of the things described here (can't blame it for being written a while back; the world changes).

Re: "So many really obvious things are missing in p4." â€¦

::Want to restore your source tree to a pristine state? "git clean -fd".

--> As of Perforce 2014.1, the "p4 clean" command does this.

::Want to store your changes temporarily to work on something else? "git stash".

--> This has been possible with the "p4 shelve" command since P4 2009.2.

::Share some code with a cube-mate without checking in? "git push".

--> There are ways to do this, but creating a branch for every person or code fix isn't a typical way of doing business in P4.

Re: Branching, git vs. P4

::Branching on Perforce is kind of like performing open heart surgery. It should only be done by professionals: experts in the art who really know what they are doing. You have to create a "branch spec" file using a special syntax.

--> This really has never been true. Branch specs are helpful but not required. If you understand branching strategy for your team/group/company, this isn't difficult at all. Merging, on the other hand, can be ugly if you do it wrong and submit the changes. That's true with any SC system.

::If you screw up, the entire company will know and forever deride you as the idiot who deleted "//depot/main".

--> You can't really delete a branch by branching. By merging, sure. This is what rollback is for.

edit

eleven years ago

Try git-annex. Problem fixed.

edit

Jed

twelve years ago

Since people are still commenting:

Try PlasticSCM, it might be close.

edit

twelve years ago

Wouldn't git + repo (Google script for android version control) = perforce?

en.wikipedia.org/wiki/Repo_(script)

basically, repo allows you to combine different git repositories together.

In the case of android, each hardware company (eg Qualcomm for their radio, Broadcom for their bluetooth/wifi) will have separate git repositories for each component.

Repo manages all the git repositories automatically (you can still control git yourself)

(yes, it still wouldn't solve problems of having many large binary blobs and calculating md5sums for them)

edit

Kevin

13 years ago

I _love_ having the full history available, it's why whenever I expect to work on an svn project, I check out the specific folders with git-svn. I very often do whatchanged -p to check other people's checkins, perhaps grepping it, etc. And the log too. I haven't tried perforce, but doing svn log on an sf.net repo is slower than just loading their viewvc web page. More than enough time to loose track of the task at hand; git lets me check logs without getting me out of my flow.

(And for those of us outside the USA, being able to work offline is a must, but I can see how not everyone will care about that.)

edit

13 years ago

Another place where Git comes up short is the inability to lock files. Many programmers seem to see that as a feature of Git. But, for anyone (artists, designers) who works extensively in binary files where changes can't be merged, the ability to lock a file for editing, or to know that someone else has already locked it for editing is the single most important feature of version control.

edit

Carl

13 years ago

Wow. So Git handles Gnome, KDE and Android among others and you're saying Git can't scale. Your argument is based on large media files and a repository setup the way Perforce likes it. It's not a problem with Git. The problem is with the way you've configured your repository. I don't blame you since you're coming to DVCS from a centralized mind set. Change the way you structure your repository and you'll find things are actually much better. You might want to look into sub modules.

Finally, you're talking about disk space. Mind telling me why I need to have double the disk space available with Perforce just to be able to switch quickly between any 2 branches? I have actually run out of disk space just because of this and have lost valuable productive time trying to free up enough space to check out another branch. Never again.

edit

King_DuckZ

13 years ago

We used git for a AAA videogame that had a good success. The repository grew up to 110-120GiB, and of course it got larger and larger as it got dirty on your computer. We had it mixed with SVN (for artists) and there were lots of binary files. With the right mix of SSD, common sense and configuring git worked just perfectly.

On the other hand, I'm using perforce right now. Turns out that even a simple merge, check-in or branching is slow. The client continously polls the server, sometimes crashes if you make it age and must rely on network and servers for every little thing you want to do. Yes, shelving relies on the server, the server even keeps track of what I have and what I don't, with the obvious desynchronization issues.

edit

Kevin

14 years ago

If your code repository for /one/ project is 6gb, you're doing something wrong with how you've structured the code. If you have 6gb of code, you have many projects. Android is built around git.

edit

Steve Hanov

14 years ago

Sam: Perforce can be frustrating because it blindly re-downloads files on a forced sync.

For each file, if the cost of a checksum is less than the cost of downloading the whole file, they should try to do an incremental transfer.

edit

Sam Liddicott

14 years ago

I'm gnashing my teeth at perforce - wants to download over a gigbyte of things that are already exactly in the place it wants to download them to because (unlike git) it doesn't scan the file system and doesn't md5sum large media files - it prefers to download them all over again.

edit

kaya

14 years ago

I've had the same experience trying to use git as a front end to our company's giant p4 repo. Unlike most of the complains I've read, we don't store big binary blobs in the repo. One or two here and there, but most of it is source files. 600MB and 27k files worth of source code. Due to really bad design choices stemming from an uncouth history in SourceSafe :) things are pretty strongly interconnected, so it doesn't make much sense to just split them up into several repos. Git on that repo was just really frustratingly slow, even compared to p4 over a VPN. I've also never managed to get git-p4 to work.

I really want the local branches and lack of needing to check out files, and we gave up on a server update since 2002 (it was that or health insurance -- that bad), but it's just become a big time sink for me to even investigate it anymore.

edit

casey

14 years ago

It's 2010 and Perforce still lacks an equivalent to 'git status'. Come on Perforce, show me the files I need to add, before I break the build, please!

edit

Gabriel

14 years ago

I was heavily modding Fallout 3 with files from fallout3nexus.com. After a while I wanted to be able to switch back and forth to different mods in a way that FOMM (Fallout Mod Manager) was unable to do well.

So I thought, "hey! I'll just take a fresh install and make it a git repo." This worked to some degree but some of the mods had large files. Eventually when I went to switch to a different branch it just died with an out of memory error.

This is because git has to be able to store the whole file in memory to process it. My machine has 6GB of RAM (the one I was using) but on Windows most versions of git are 32-bit.

Bam. Dead in the water. I had to actually boot up Ubuntu on a live disc, apt-get install the 64-bit version of git just to swap branches. Fail; plain and simple.

It sucks to have a designer create a great tool like git only to have him also be too lazy to solve some edge cases for others.

* File sizes > RAM? This should be doable in a slower way only when needed.

* File sizes > 32-bit version capabilities? Again fix it but have it use the slower algorithm only when needed.

* 32-bit only version.... Seriously most new computers other than netbooks have 64-bit capability these days. Just make it the default

Being too stuck up to solve this problem that would obviously increase adoption of your tool just seems dumb. And for those who say >6GB repos and you're doing something wrong or don't have large repos or don't revision large files obviously haven't run across a business need to do so but when your paycheck requires it you'll be singing a different tune.

I used Perforce when I worked at Google and will likely use it again in my next company for which I just got hired. I like it but I know I am going to miss features from a DVCS. I used Bazaar at my last company and it was quite nice but also suffers from the same problem as git and I believe hg.

edit

Patrick

15 years ago

We think Perforce is really cheap.

Zero downtime. No administration needed. What else can one ask for?

edit

Julian Adams

15 years ago

Often Git and Mercurial are sold as DVCS being the killer feature. To me branching being a first-class and easy operation is the killer feature, and that doesn't require DVCS. We use Accurev, which has first class branching and is server based. A full checkout of our codebase is about 10GB, although most people only checkout 4-6GB of that. The depot history goes back to 2004. With those sizes it works just fine.

edit

geekonek

16 years ago

>it's not so much the size in space,

>but the number of files that is painful

That's exacly my case. We tried to migrate WebMethods repository containing lots of services (corporate scale, all currently used/deployed, and cannot be split into submodules/subtrees). It contanis like 100k files, and doing simple git status took about 10 minutes of disk IO while it was scanning for changes.

edit

Jared Oberhaus

16 years ago

I recently wrote about git not being able to handle large repositories as well; as you elude to, it's not so much the size in space, but the number of files that is painful:

www.jaredoberhaus.com/tech_notes/2008/12/git-is-slow-too-many-lstat-operations.html

git is clearly designed for what I would call "small" projects like the Linux kernel. If you want to do another project, you do not add it to an existing git repository, you make another one. This best fits with pushing and pulling a single project. But if you have a large system that is composed of many such smaller projects, you have to use something other than the source control system to synchronize their dependencies.

edit

mark

16 years ago

To be honest, I think you should write a new article that refers to these comments here as well. Some even claim that having 6 Gigs source code is in the wrong here... what kind of projects produces 6 gigs of source code, not even java is that verbose... ;-)

edit

Nath

16 years ago

>"I don't think many people really use distributed source control."

Translated:

>"I don't need distributed source control, so I know nobody out there will need it, as I don't see why they should. But they WILL need to move 6gb repos, because I do, so that's what normal people needs."

In short, different people different needs. I'm the happiest SCM user since I switched to git for my <6gb projects, which doesn't mean it does have to fit everyone and every possible project, for the same reason I don't use vim to edit jpg files.

edit

:emaN laeR

16 years ago

By the way, you can your .git folder wherever you want by using git --bare init and pushing to that.

Or you could go my way and have your .git folder actually be a symlink to a folder on another machine over ssh.

Please. When you argue about this stuff, please research thoroughly. There is a lot of things you can do with git that just takes a while to learn.

Git is like really good drugs.

edit

Jason

16 years ago

Cheap branching is still broken and doesn't work properly under CVS, Subversion, or Perforce.

It works well under the DVCS tools such as Git and Mercurial (though a lack of branch naming is sometimes an issue depending upon the tool) - it works absolutely blindingly under Clearcase - unfortunately for Clearcase it is expensive IBM software, and the hardware constraints on that tool (particularly for dynamic views) make it compromises also.

Perforce, CVS, Subversion are cut from the same cloth however - they are lightyears behind the branching capabilities of DVCS's and also Clearcase which has had fantastic branching semantics available since the mid-90s.

edit

D Herring

16 years ago

I wish our data sets were small enough to check in to P4. Or does it handle a few TB of uncorrelated sensor data with ease? There's always an upper bound. P4 seems to hit the sweet spot for game design; but for raw code, I'll stick with git.

BTW, rather than store the data in the repo, we've started storing the git hashes with the data. Works nicely.

edit

Carl

16 years ago

The whole point of git is that you work on only what you need to and leave the rest to the others. When you're working on what you need to, yes it is amazing to have all the history since day 1, especially since that day 1 code could have been written by somebody else thinking something else.

Why would you ever expect git to work well in a centralized usage scenario? Would you expect p4 to work well in a distributed use case? Honestly, dude...Apples and Oranges.

And what happens when that central server is inaccessible? or when you're travelling to a trade show with a demo and you have a really cool idea on the plane you'd like to try out? P4 can be a real pain in the proverbial wazoo in those circumstances.

edit

Marko

16 years ago

I agree that Perforce is probably better for most people. Having used it at a previous job, I wish I could go back to it. However, we're using git here because the Perforce prices have gotten sky-high! $900/user just to get in the door is ridiculous. If I wanted to pay that kind of money, I'd get a real tool like ClearCase...

edit

Daniel Barkalow

16 years ago

It's been a long time since I used git-p4, because, well, it couldn't handle the depot. But I've written a more efficient and more targeted importer (as well as a plugin mechanism for the core git); if you're building git yourself, you can pull "git://iabervon.org/git.git p4-clean" and try it. I use it at work with our large perforce depot and it does a good job for all of the parts of the depot I happen to work on. Exporting is left as an exercise for the reader (and if you do it, let me know), but it's great for figuring out what actually happened in the recent history and for previewing tricky merges so that you can check whether you're doing them right in p4 afterwards.

You'll want to get some p4api and set P4API_BASE to the directory where you untar it; this lets the plugin use the C++ bindings for perforce instead of running the command-line client.

Look at Documentation/vcs-git-p4.txt for how to configure it; you generally end up actually getting data simply with "git fetch origin" (or "git fetch" if you apply the bugfix I forgot to send back from work).

edit

anon

16 years ago

binary files are a known problem and are receiving some attention in git. There are some ideas in the cooking pot that may make a big difference. On the mailing list it was asked if anyone has a repository that could be experimented upon.

edit

Scott Bilas

16 years ago

Thanks for this post. I've been hearing such great stuff about git, and like you the commands it offer seem absolutely killer. But I was concerned it would suffer from the same problems as all the other open source SCM's: it dies horribly with large files.

I'm in the games biz myself and we ran into these problems with svn. Once we got past a certain size team and asset base, it started to really choke. I wrote up a little postmortem at scottbilas.com about our experience with it (search for 'svn').

We tried really hard to make svn work because of the astronomical price of P4. A price that we all grudgingly pay again and again in this industry because everything else is so much worse.

My current plan is to clone the commands from git into our command line p4 extension tool we have (it does things like auto-creating Crucible code reviews and such). For example, 'stash' should be pretty easy to implement. Actually, it already exists. Search the p4 public depot for 'p4tar'. I haven't tried it out yet.

Anyway the other commands should be implementable with a tool on top of p4 using p4api.net. If I only had some spare time.. :)

edit

Sam Vilain

16 years ago

There's a bit of confusion in this piece. Firstly, what systems like Perforce do is collect many projects in one place and give you a timeline for them. So when looking at "repository size", consider that you don't normally keep every project in the same repository with git.

Of course with the Perl Perforce repository, the size was something like 450MB in Perforce and 70MB in Git, once the crazy metadata format used by perforce's insane integration system were appropriately grokked.

I mean, don't get me wrong, I think Perforce is a great product - beats SVN hands-down in design and was around many years before - it's just too complex. Integration is badly modelled, hardly anyone understands it properly. So in that respect, Perforce doesn't scale to very large teams because the branching model is too hard to work with.

Yes of course Git doesn't do a lot of that product release cycle development / Software Configuration Management. It's unix: it does one thing and does it well.

edit

16 years ago

Unenlightened thoughts personal thoughts on storing binary info in Perforce, or any other version control tool....

IMHO, a version/revision control tool, with all it's diff, 3-way-merging, and compressed delta storage goodies is at it's best when it's storing editable source. Storing binary data, especially binary data that can be recreated from the version controlled source, is not the ideal use for this kind of system. That said, I've done it too, because I also believe that every version of the source should include the tools used to process the source into product shipped to the customer. But I would like to consider the use of a different paradigm for the archiving of binary data, especially mongo BLOBs. I would like to consider a system more ideally suited to storing Big Honkin binary files, and have a reference to those BLOBs in the version control system. Now I wonder what would work.....

edit

16 years ago

{quote}Let me merge changes into my coworker's repos, without having to check them in first. {/quote}

Why? What's the big deal with checking in? Use a personal branch, and have your bunker-mate use one two. Check-in your WIP on a regular basis, just in case your drive goes kablooie.

{quote}Let me "stash" stuff cause it's really handy. Clean is nice to have too.{/quote}

I must be missing something. Wouldn't a personal branch work just fine for this?

{quote}Make branching easy. {/quote}

Branching in Perforce is difficult for users who don't understand the nuances of client workspace mapping. When you understand how the repository is structured, and how your local hard drive is layed out, it becomes so much easier. If you don't know the structure of the repository, which contains the family jewels, please turn in your coder's badge. If you don't know how your own disc is structured, please turn in your computer.

{quote}Don't waste 40% of my disk space with a .git folder, when this could be stored on a central server. {/quote}

Good idea. I'm curious -- let's say we had a multi-Tb repository, with 80k files on just one tip, tens of thousands of branches, 1600 coders, 11 locations, 8 time-zones. If we were using GIT, and I wanted to work disconnected from the network for a couple days, what would be "gotten" onto my laptop?

edit

16 years ago

If you have that large a repo, it's probably because you're stuffing large binary blobs into git. If you're stuffing large binary blobs into git, you need to look into the .gitattributes file so that git won't try to diff/compress said large binary files. It's got some heuristics to try and recognize them, but making its work a bit easier is sure to show you some gain.

edit

dave

16 years ago

Go for Plastic SCM, the best of GIT and Perforce combined, and a decent GUI. Replication, merging, really fast true branches (not like GIT or P4)...

edit

markcol

16 years ago

There was quite a bit of research done a while ago investigating the size of the average dev team. The number was <10. Kind of surprising, but true. There are relatively few places in the world where enormous, cross-referenced project repositories are needed: Microsoft, Google, Siemens, Philips, government agencies, etc.

However, for 99% of the software developers out there, git (or one of it's DVCS brethren) just works. In those cases, the benefits of being entirely mobile, having near zero time cost for most actions, and the ability to easily experiment with the contents of the repository are game-changing wins. For the top 1%, there are tools like Clearcase and Perforce.

edit

John Fries

16 years ago

interesting writeup. Would be very interested in hearing a three-way comparison of git, subversion and perforce, since those seem to be the three that most people have to decide between.

Thanks,

John

edit

Good Post

16 years ago

@masukomi... you can set up P4 proxies to help alleviate the pain if you have a lot of data to transfer. But you are right, it has to keep a database on a single server, the size of which is dependent on the number of clientspecs/branches.

edit

masukomi

16 years ago

actually.... p4 doesn't run on a cluster of servers. That's one of it's biggest shortcomings. It runs on ONE server, one really, big beefy freaking server if you have lots of stuff and users. Google, for example, was having serious problems with the speed of, well, everything p4, until they went out and bought one of the most powerful computers they could. Then all was well again.

So yeah, it's scaleable, but it's directly proportional to the size of the server it's on.

edit

Alan

16 years ago

@Jason P: Review Board works great with Perforce.

edit

Good Post

16 years ago

Hey Anon, yes, this has been mentioned with respect to DSCMS. Those who are working on commercial versions have been working on potential solutions.

In the meantime, it's still just easier to dump stuff into P4. Beefy 64-bit P4 servers are cheap to build now.

edit

Anon

16 years ago

Hi all, interesting discussion. I am curious though:

it seems as if there is a specific problem with Git, namely it doesn't handle large binary files well (large images, artwork, etc).

Has anyone actually taken this specific use-case to the Git developers on the mailing list?

Second, it seems like your problem could be solved by having a separate machine to run Git just for your Binary assets. When you need to make a build, you just dump all those files to the machine, have it version the directory, and then include that 'version' into your Git source repo.

Interesting post.

edit

Good post

16 years ago

I just wanted to let everyone know that this post is dead-on. I work at a software company that is entirely based on P4. The repositories are huge because they contain a lot of non-source files, like Photoshop, videos and such. Trying to push to git has been painful because it is massively slow on any large repository. The insert alone can take several hours.

Any web company with non-source code in their repo will run into the same thing. I'm surprised more people haven't pointed out this glaring problem with the git model.

edit

Ahminus

16 years ago

You can't simply measure performance against the size of the repository and call that "scalable".

The number of simultaneous clients that can be doing operations is just as important, if not more so. P4 was notorious for holding locks far longer than necessary, and clients would queue up for minutes at a time (I rememeber syncs that would take more than half an hour on a fairly small repository because there were a hundred other clients trying to sync).

P4 does *not* in fact, scale well (although, I admit that more recent versions of P4 are better than what I was using is 2004).

I feel pretty confident your assessment of git would be different if you had 1,000 coworkers using your P4 repository at the same time.

edit

Woody Gilk

16 years ago

So basically, you wrote a post bashing for git not doing something it was never designed to do? Good job, buddy.

edit

Hatem Nassrat

16 years ago

Having to work over any network will be really slow. Having the ability to work on an airplane is super cool.

As far as flexibility Git is Awesome, so stop posting things that don't make sense.

edit

Ted M

16 years ago

"If your repo is 6GB+, I think you're doing it wrong..."

Then you've never been responsible for builds in the games biz...

edit

Ted M

16 years ago

This is pretty much our experience, too. P4 rules the kingdom in games development, probably because we need to version HUGE amounts of artwork (which is almost always binary) as well as code.

A lot of teams try out Alien Brain, and quickly realize that

1. It's structured like Visual Source Safe or CVS (in other words you aren't REALLY versioning changes, just files, and that's really bad), and

2. Versioning artwork against code is just as important as versioning one code change against another or one artwork change against another, and having your artwork and your code in different version control systems, even when they're both structured around atomic changes (which Alien Brain isn't) causes problems.

So most teams just dump artwork, intermediate data files, and all sorts of things in the same p4 depot that their code is in. And it works like a champ. Except that p4 is missing so many of the cool features that git gives you.

edit

Jason Dusek

16 years ago

"The distributed stuff isn't really asked for, or needed. It just makes it hard to learn."

But not much later:

"Let me merge changes into my coworker's repos, without having to check them in first."

That would be distributed stuff.

edit

Daniel Stockman

16 years ago

As a CLI-proficient user of both git and p4 (also having hacked git-p4 to restore some sanity), I can state with full confidence that git beats p4's CLI like a redheaded stepchild eight days a week and thrice on Sundays. Any perceived benefits or "power" that p4 gains from being adept with binary blobs of redonkulous girth is irrelevant when the command line tool is worse than friggin' CVS and all of the GUIs suck.

(All SCM GUIs suck, imo, but that's my CLI-bias bleeding through)

edit

SJS

16 years ago

The one feature of a DVCS that I really really really like is the ability to use it as a sneakernet. Not all of the machines I develop on are connected to a network, or connected to the same network that the central/blessed repository is on.

Bypassing the central respository to share patches... meh. This I do not see as a feature -- if there's a central repository, it should be used as the mechanism of communication between developers.

On the other hand, "stashing" stuff is really nice. And branching (and merging) *should* be easy. I'm all over those two requests.

As for wasting my disk space... meh. Sometimes I care, sometimes I don't (disk is cheap, but disk fills up faster still). Having an option for git to use either a local or a remote (central/blessed) repository would be nice.

Disclaimer: I still use CVS, I've used Perforce (and liked it), and I use git (and like it), and I don't currently have an repositories that approach the sizes discussed in the article.

edit

NotAFan

16 years ago

Perforce may scale well with regards to data size. In my experience it doesn't scaled well over a distributed network. Between having to check files out to work on them and tight integration with Visual Studio, if your link to the Perforce server does down you practically have to stop work.

My one experience of Perforce was doing work with another company remotely. Our VPN was unfortunately a bit dodgy. Combine that with Perforce lead to an incredibly frustrating experience.

I wouldn't recommend using it if you're not on a LAN.

edit

Ranga

16 years ago

Wrt chewing up disk space in .git: check out git alternates. 'git clone -s' does it for you autmatically. I ran into this exact problem and the alternates stuff works like magic!

Agree with your points about scalability. Git is not good for anything other than source code (medium # of small text files).

edit

albert

16 years ago

"Let me merge changes into my coworker's repos, without having to check them in first."

Would that be your coworkers distributed repository, by any chance?

edit

jeremydw

16 years ago

@ Jason P, an internal wrapper script for p4 that sends changelists to reviewers for reviews and approvals.

To merge changes into a coworker's repo, why can't they just patch a CL? You don't have to submit a CL for a coworker to grab the changes.

edit

wcoenen

16 years ago

Let me get this straight: you're saying perforce is faster than git for large projects? This surprises me because most git operations are completely off-line since all the data is local. I thought that operations which require network I/O are the slower ones. Care to back up your claim with a specific use case and some data? (It's an honest question btw, I don't use git or perforce so I'm not defending git here.)

edit

Jason P

16 years ago

I'm curious as to how other people who work with huge repos in P4 send around diff packs for review. diff/patch? Something internal?

edit

zzz

16 years ago

"we store all of our code and data in p4 because it's the Right thing to do"

Well you may have identified another use case where Git is not ideal - really large binary blobs. I think the problem is Git has to checksum (sorry SHA1) all files it scans - and that would take some time on a 36GB file.

To be fair, Git has always been advertised as a SCM - i.e. a source-code management system - and for that use-case it absolutely rocks IMO. Personally I would still investigate a hybrid approach where you have the option of pulling just the source down to your lappy with Git, so if you are on the plane and you DO want to look at change-set 1 at least you can!

edit

Mike

16 years ago

If your repo is 6GB+, I think you're doing it wrong...

edit

Stephen Waits

16 years ago

@zzz, FWIW, we store all of our code and data in p4 because it's the Right thing to do. We make video games PS3+BluRay == massive content. At any given time, our data works with our code. If I need to sync back a month to look at some issue, I need the specific data to be sync'ed back too.

Really, for us, p4 works great. It stays out of our way, it's faster than anything out there. It's not distributed, but we don't care about that.

edit

Stephen Waits

16 years ago

I've done some testing on big p4 repositories. Specifically, 36GB. Git was awful. p4 continues to haul ass. A p4 sync takes less than one second, if no files have changed on the server. Just doing a git status was on the order of minutes.

Git, plain and simple, does not scale to large repositories. That's OK, I guess, it's not really designed to handle that use case.

edit

zzz

16 years ago

Git was designed to be a version control tool - no a quasi- file-server 'repository' which is how most other tool like Subversion and Perforce are actually used. It was also not designed to track a whole set of unrelated projects - say a teams entire code-base, something that both Linus and then Randall made pretty clear.

The solution? Track each project as a single Git repository, and if you need to tie them together, create a master repository that included each one as a sub-module. The flexibility you gain from 'setting free' your individual projects is enormous, as it the smart use of a master repository that uses branches to create different mash-ups of your overall code-base.

edit

Holger Schurig

16 years ago

Try "git clone --depth 0". This way you clone a remote git repository, but you do *NOT* download the history since commit 1.

Finding great ideas for your startup

"I just don't have any ideas." This is the #1 stumbling block for budding entrepreneurs. Here are a few techniques to get the creative juices flowing.

Coding tips they don't teach you in school

Some time-saving shortcuts for C code that will make your coworkers scream. In Awe.

Regular Expression Matching can be Ugly and Slow

If you open the first few pages of O'Reilly's Beautiful Code, you will find a well written chapter by Brian Kernighan (Personal motto: "No, I didn't invent C. Who told you that?"). The non-C inventing professor describes how a limited form of regular expressions can be implemented elegantly in only a few lines of C code.

20 lines of code that will beat A/B testing every time

A/B testing is used far too often, for something that performs so badly. It is defective by design: Segment users into two groups. Show the A group the old, tried and true stuff. Show the B group the new whiz-bang design with the bigger buttons and slightly different copy. After a while, take a look at the stats and figure out which group presses the button more often. Sounds good, right? The problem is staring you in the face. It is the same dilemma faced by researchers administering drug studies. During drug trials, you can only give half the patients the life saving treatment. The others get sugar water. If the treatment works, group B lost out. This sacrifice is made to get good data. But it doesn't have to be this way.

The simple and obvious way to walk through a graph

At some point in your programming career you may have to go through a graph of items and process them all exactly once. If you keep following neighbours, the path might loop back on itself, so you need to keep track of which ones have been processed already.

It's a dirty job... (comic)

My thoughts on various programming languages

Some ill-informed remarks on various programming languages.

Why are all my lines fuzzy in cairo?

Make sure your lines are sharp using this simple trick.

0, 1, Many, a Zillion

It's common wisdom that there should only be three numbers in source code. But there's actually four. Here's why.

[comic] Appreciation of xkcd comics vs. technical ability